Learn to Code a Game AI - Tesseract - TeamFightTactics - Part 4


 


Description

This part 4 of my series on coding an AI to play TFT. This teaches how to do screen scraping with Tesseract in TypeScript. Other topics covered: OpenCV, TensorFlow, JavaScript, LoDash, Text comparison with Levenshtein, Open-Source, NPMjs, NodeJS, optimization, classes, constructors, variables, datatypes, lambdas, hashmaps, loops.

Transcript

hey guys this video is a continuation of my series on coding a bot to play the team fight Tactics video game in this video I'm going to show you how to code the portion which extracts the useful information out of the game so that the AI can make decisions to start with I've taken a whole bunch of screenshots of myself playing the game so that I can use it for testing while I'm coding so that I don't have to actually play the game while I'm coding so here's the main screen that's most important and you'll notice a bunch of elements that are useful to the player so for example at the bottom are the units that I can select for purchase so I need the AI to be able to extract this information out of the screen as I'm playing there are two ways I might consider doing this one is to analyze the actual Graphics so for example each character has a different portrait to the computer reading the names is significantly easier so that's the option I want to approach today and then in a separate video the things that cannot be extracted with text I'll instead use the image analysis libraries to try to extract out from the 3D models and pictures instead so to start out our project first I'm going to open the visual studio code program and I'm going to open up the terminal and I'm going to change my directory to a folder I set up for this project you can see that other than my screenshots there's nothing in here when I run that npm init command it will generate some files for me so first it's going to ask me some questions after that npm init command is finished it creates this file called package.json you may want to just double check that your node is set up correctly so for example I could type node-v that'll give me my version number which I just recently updated so I know that's the latest but you can do the same for the npm command so that's my version of npm you may want to update both of these node from the node website npm you can update using the npm install command Dash g means Global so this package Json file file will remember which packages you've installed but if you do Dash G for Global instead of my local package.json there's a global one that's shared for all of your projects so you would use this for packages that you use frequently so this is a little bit weird but npm is used to install packages but I can also use it to update itself so I'll say npm and then the at sign with the version number okay so the next thing I want to do I mentioned that I'm going to use typescript instead of JavaScript so I'm going to say npm install typescript you'll notice in my package Json it automatically added a dependencies key and the first dependencies typescript and it automatically determine the version number so now that we have typescript installed we need to set up our typescript so I can do that with the npx command that also comes from npm but the X just stands for execute so I can run these dependencies as if they were executable programs so TSC e comes from the typescript dependency and it stands for typescript's compiler and these type of commands that I'm showing to you they're pretty common you'll use them a lot so you can just kind of memorize them over time if you code frequently but if not if you just Googled for example how to code a hello world program in typescript they're going to kind of explain these steps to you within the typescript compile command since this is the first time I'm going to set up my program I'm going to use the dash dash in it so that has created for me a file called TS config so if I click to show all the files you can see I have the package.json which npm in it created and now I have tsconfig.json which the npx TSC init command created and then the package lock.json that is from the npm install command so let's take a look at this tsconfig file and it's just a whole bunch of different settings for how typescript is going to compile so typescript is a superset of JavaScript meaning that all of the existing JavaScript code works but you can use additional commands that are not built into JavaScript and it all compiles down to JavaScript so it just adds new features to the JavaScript language most of these settings I can just kind of leave it the default but there are two settings I need to change to make my projects easier to code so I uncommented them and I'm going to change them from True to false no implicit any and strict null checks these are extra rules that the compiler will run to check for warnings in your code but most of the time warnings are not actually pointing out a real bug in the code I don't need the compiler to refuse to run my code just because it thinks there might be an issue so now typescript is set up so that we can run let me just create the last file I need which is index.ts so that's going to be where our actual code resides so just to make sure our typescripts is working I'm just going to do like a console soul.log statement is typescript working question mark I'll open up the terminal and I'll say npx TS node period backslash index.ts TS node stands for typescript node so this is just going to compile and run my typescript file if I hit enter it runs my code and it says it's typescript working and you'll notice that it open up this run window it will actually attach a debugger so if I put a breakpoint in here by clicking on the left side here and then I run it again you'll see that it stopped at my break point so it it has not actually printed on the screen yet is typescript working because this console log has not run and here inside the debugger I can choose to step through the code one line at a time and on the left side I can see all of the variables that are in memory in my program right now so if I was trying to figure out if my code was working or not I might want to look and inspect these variables to see if they have the values that I expect them to have so I'll just continue so that the code finish is running so the next step in our project is going to be to set up the tessract API inside of typescript I can do that with another npm install command for node Tesseract Dash OCR and you'll see in the package Json that it created the dependency on it but if I copy this and pull it up on the npm JS website then I can read information about how this project works so it has for example the install command some code for analyzing an image and it also has a GitHub link where I could go to read additional help see the source code and in the issues tab I could file bugs and in the pull request I could submit my own code if that improves on the Tesseract library but something important to know about this particular npm library is Tesseract itself is the code for recognizing the text in an image the OCR stands for optical character recognition but this particular library is just a node wrapper it does not actually contain the code for Tesseract itself it expects you to install Tesseract separately and this will just call the Tesseract executable so if you wanted to install Tesseract you would open up separate GitHub for the main Tesseract library and it has an installer I'm on windows so this is the installer I would run and if you want to check whether or not Tesseract is installed on your computer you can open up a command prompt and type the word Tesseract and in my case it shows the help options for how to execute the Tesseract command for example typically you pass in an image file name if yours just says error Tesseract commands not recognized then you don't have Tesseract installed and you'll need to download the installer from the GitHub page so now that we've verified Tesseract is installed on our computer we can go back to the example code that we are given and use that as a template to start our program so first I'm going to import Tesseract into my project this is different from the npm install command the npm install downloads the files into a folder called node modules so if I open up this node modules and look at node tessract OCR I can see a bunch of files that were downloaded by npm this import command is going to look at this index.js file and include that code inside of my index.ts file so that all the Tesseract code stays in its own file but I can reference any apis that are exposed by the Tesseract JS code so next I want to just do a console log statement on the results of executing Tesseract Tesseract has a recognized function which will scan an image for text and you'll notice that vs code is able to show all of the parameters that the recognized function needs part of that is because we're using typescript and so functions in typescript are well defined with specific parameters and data types so that gives me a hint on what type of parameters actually pass in which is really helpful so I'm going to pass in one of these screenshots and I'll open the image so that we can see what we expect it to be able to read so we want it to read words like Gnar gadgetine prankster Etc there's words all over the screen so I'm going to declare config variable where I'll pass in the settings for Tesseract I have the file name and I have to use double backslashes because JavaScript has escaped characters in strings for special characters that can't be represented easily so for example slash R is a carriage return like if you press the enter key you can't really do an Enter key inside of JavaScript because now the string is on two separate lines and that's going to confuse the JavaScript interpreter so by using the slash R I can put an Enter key inside of a string but because backslash is the prefix for doing a special character then you can't do a single backslash by itself so instead I have to do two backslashes and so it will essentially turn into a single backslash when The Interpreter removes the prefix or the Escape character so the config settings I'm going to pass in I just copied from that example on the npm website but I'm going to use language of English OEM stands for or OCR engine mode so it's kind of like which version of Tesseract I'm using so number one just means the newest engine PSM stands for page segmentation mode and I'm going to start with number one which is automatic and let me give you an example of what page segmentation mode means if I pull up a form like this you'll notice that it uses red borders everywhere this is actually designed so that it can be scanned by a computer each box having a specific purpose and the page segmentation mode that I chose earlier of automatic it's designed for this type of format and the last parameter I need to send in is the list of characters that I'm allowing it to search for so for example if I know that I'm only going to have numbers then I can do this and it'll never give me any alphabet characters and I think I'll actually just leave this parameter off for now to just allow it to search for any characters now this await command if you press F12 it will take you to the definition of this function so the parameters that it wants as input and the value that it returns but you'll notice that the return type is a promise and inside of the promise is a string A promise is used for asynchronous coding and what that means is anything that does not return a result instantly so an example would be if I just say let math equals one plus one the computer knows the answer to this instantly so it doesn't need to wait on anything so that's called synchronous but if instead I said something like let download equals HTTP client google.com question mark you know JavaScript so that's just pseudo code but Google's not instant it's got a send a message to my internet company it's going to take multiple hops through the network to get to Google and then Google's gotta search its database then it's got a convert it to HTML and then it's got to send it back to me and the the internet was not designed to send large amounts of information at a time instead it sends little packets and you have to wait for all of the packets to arrive and then your computer combines all the packets together to get the final result so this command is not instant and when we talk about instant with regard to a computer you got to think about your CPU the typical CPU today is going to have a clock speed of like four gigahertz meaning it can do a trillion calculations in a single second well calculating one plus one can probably be done in one single cycle a more complicated programming command might take five or ten Cycles but that's still extremely fast that's happening in nanoseconds downloading from Google might take half a second well that's an eternity for a computer so this code that's going to download from Google needs to be asynchronous so that the CPU can keep doing other things while it's waiting for this internet response to come back so because recognize is a promise variable a promise means that it will notify you when the result is available so that your code can keep on running so there's two ways that I can do this so I am using an asynchronous weight so it's going to yield and allow other programs to run until the results is available and then when the notification comes in that it's ready it'll continue to execute the next line of code so for example if I do a console.log here and I have like done with the recognized command this log will not run until recognize is finished so that allows me to still code it as if it was synchronous while still not locking up the CPU an alternative to that would be if I said take the result of executing this function and put it into a variable then here if I did to console.log I would actually just say recognize started because it actually has not finished yet and I could do a promise dot then command I pass it a parameter and that parameter is another function and that function is a callback I do an additional console log inside here saying it's done this is where the final result is available so it's a little bit weird because as a human you expect to read things one line at a time sequentially but in this case it's kind of jumping around because it'll run this first then run the outer piece here then run this line then I'll jump back and run the inner piece and this interface might run five seconds later depending on how long the recognize took so promises are the base data type in JavaScript for doing asynchronous code but it's a little confusing that things run out of order so that's the benefit of the await command is you can write your code as if it was sequential and behind the scenes the compiler will rewrite guided to be these promises so I'm going to use the await command instead and I'll just log the result that came from running recognize so one issue here is that the await command cannot be used in the top level of a node program it has to be inside of a function so I just need to wrap this whole thing in a function so I'll just say function test and then outside of this test function I'll just execute it and then I'll paste my code inside of the test and then if you're going to use the await command inside of a function you have to declare the function as asynchronous so I'll just put async at the front Okay so let's run this code right here and see what happens okay it looks like it ran and it output everything that it was able to read off of the image so let's just see how it did so the first thing kind of looks like gibberish but it's iv-i-l-e-p-be that's the first line that I found and then I found a whole bunch of other lines so let's see if we can find this word in the image I'm going to zoom in okay I think what it was reading here was this player's name on the right side and it seemed like for the most part it read that word correctly the only thing I would say it read wrong is to me the V looks like it's capitalized but Tesseract shows it as a lowercase okay so the next thing I read is backslash V Space level space five exclamation I think I got that from this level five exclamation directly underneath that player's name and so the level five exclamation is correct however this backslash V space that does not match at all but you'll notice in the picture there's like this weird Arrow icon because that Arrow has kind of a well-defined shape then I think it's analyzing that and thinking it's a letter and because there's not a letter in the alphabet that perfectly matches that shape it's just giving its best guess so that's one issue that we're gonna have to deal with is how to not have it get confused by these random Graphics because if that icon can confuse it then could these pictures of weapons confuse it so I'm not going to look at every little thing it read but it seems like it's getting some of it correct and some of it wrong so let's think about how we can fix this I chose to make this the page segmentation mode of automatic which I mentioned was designed to scan a document like this with a specific template but TFT isn't exactly like this and probably a lot of the random Graphics are confusing it so what if instead we took our image and we cropped to just a specific part that we wanted to read could it then read it correctly so for example here we have Nar Lucian Alistar Jax and Ash let's see if any of those were read correctly as I scroll through the results I notice almost all of them are here at the bottom Nar Lucha and Alistar and Jax but Ash is missing so if I crop just the word Ash and I make a brand new image and I paste the portion I cropped and save it crops.png now something to keep in mind here when I save I want to use PNG instead of jpg because a JPEG file is compressed and that matters because a library like Tesseract it's using mathematics to try to analyze the image and anything that modifies the image May reduce the quality of the results that Tesseract gives us so we want to keep it the same as how the original game rendered it and then if we get poor results we want to be very careful about which algorithms we run to try to improve our results we don't want to just accidentally modify the image without being purposeful because we care more about accuracy than we do file size in this situation and in addition to that you'll notice on my screenshots they're all 1920 by 1080. whatever your monitor screen resolution is you want to make sure that you take that screenshot from the game in that native resolution and don't resize it and again I've saved as PNG because later on on when we start analyzing 3D models or icons for weapons that same concept is going to apply so we want to make sure we don't distort the image in any way before we analyze it okay now we've got this cropped image let's rerun our recognized code and see if it's able to find that word because when it was trying to find it in the overall large image it was not able to okay right here we see it got the word Ash so the first thing we've learned is we don't necessarily want to trust this automatic page segmentation mode and our specific example of scraping a game so we're going to need to crop everything on our own based on the location where we expect to find the right text so for example I need to pull up this image in a paint program and get the coordinates for the UI elements I care about reading and make a list of all of them and I need to Loop through that list and crop the image multiple times and on each cropped image run Tesseract to just get that one word and if I'm going to do that instead of using page segmentation mode automatic which is the number one I'm going to use seven instead which is just looking for a single line of text and then also in order to avoid some of the gibberish for example we had like this backslash V if we know we're never going to be looking for a backslash then we want to put in this dictionary to avoid if finding incorrect characters that we know are not accurate but we'll want a different dictionary for each part of the UI for example we know that the name won't necessarily have a number but the gold won't have a letter and some UI elements we don't necessarily care about for example we need to know that we have five gold but we don't need to know that Nar costs three gold and we don't need to know that Nar is a gadgetin and a prankster because we can hard code who the character Nar is and what his attributes are because that's not going to change from game to game so we really just need to know that it's a Nar and that uniquely will tell us all of his attributes so let's make some code changes and just for informational purposes of how Tesseract works I'm going to pull up this image so essentially when the computer looks at an image like this flower on the left there are just too many colors and just too much noise it's hard for it to know what this picture is so the computer first run means some math through the image to modify it and it uses an edge detection algorithm so that looks at large changes in contrast or color so it converts it to this middle picture which is just black and white and it kind of shows the sharpness of the edges and that kind of gets rid of a lot of the noise so that the computer can see the overall shape and once the noise has been removed so that just the shape is obvious Tesseract then runs it through something called a neural network and a neural network is modeled after how scientists think our brains work with synapses that fire and send messages to each other well if we want this to recognize that this is a flower what we do is we train the network based on this input and we tell it what the output is that we expect and the training goes in reverse and think of these middle layer nodes their internal mathematical function can be changed and so the output instead of it just saying the word flower maybe we want it to say a percentage likelihood that it is a flower if you run this training process millions of times off of thousands of images and maybe a hundred of those images are flowers and the others are just random images of anything except for a flower then it can find a pattern where only the flowers come up with a high probability and everything else comes with a low probability and it's never going to be perfect but it can be close and so Tesseract uses this concept but instead of looking for flowers it's looking for shapes in the alphabet so it might train one to recognize capital A a different neural network to recognize the lowercase a and it's going to run each of these neural networks off of your image and then look at all the probabilities and it it's going to pick the highest probability and say this is what letter we think this is so that's how tessract works so let's modify our code to be able to read the rest of this screenshot so I'm going to install two more npm packages npm install at u4 slash open cv4 node.js opencv stands for open computer vision and similar to Tesseract it's a computer vision library but instead of being specific to reading letters it's more generic can read a whole bunch of different things and I'll go into more detail on that library in a future video in this particular video I'm just going to use it for cropping images and I'm also going to install at tensorflow tfjs-node tensorflow is a machine learning library and where I just explained how Tesseract worked using a neural network neural networks are one type of machine learning algorithm so tensorflow could be used to make a neural network and it could be used to make something like Tesseract but machine learning requires a lot of complicated math and so tensorflow has some math formulas that we'll find useful in particular it has a lot of Matrix functions and so we're also going to use it to help us with the cropping because when you think about it an image really is just a matrix of colors it's a two-dimensional array of rows and Columns of colors which can be represented in a matrix so in college if you get into high-end math there's a subject called linear algebra and that's basically all related to Matrix math and matrices are really good for computer vision and Graphics so when we install that opencv Library unfortunately most npm libraries come with everything you need when you say npm install but just like with Tesseract how we had to install it separately opencv is that same way so we're going to need to go to opencv.org releases and download the windows installer for the latest version otherwise when we try to run any opencv code it won't work and I've already done that but if you're following along you'll need to do that so let's write our code to crop an image so I want a function called crop image and I want to pass it an image as input and I want the coordinates where it's going to be cropped and then for the data types I'm going to use a cv.mat data type for the actual image itself CV from the opencv library and matte meaning a matrix and I'm getting a red error on CV because have not imported the CV node module into my index TS code so I'll say import Star as CV from at u4 and it looks like it'll auto fill in for me so the asterisk just means to import everything and put it all underneath a global variable called CV so that way CV dot will give me access to everything inside of the CV Library if you know for sure that you only need specific functions so for example like just this matte instead of doing Star as CV you could put in curly braces just a comma separated list of which specific apis inside of the node module you need but I find it much easier especially while you're still experimenting to just use the asterisk so that you get everything the downside to that is you're going to get a lot more code than you need imported into your program so when it compiles you your executable is going to be larger and so when you give it to your users they have to download more the code's going to run a little bit slower however that was more an issue in the past when JavaScript compilers were rudimentary now when you compile they use a technique called tree shaking they try to analyze your code to find which functions are never used it can remove them from the final compiled program and keep it small and fast but the compilers are not perfect mainly because JavaScript was not designed originally for these type of complex programs and so JavaScript has features that prevent tree shaking from working very well so if you're really wanting like perfectly fast performance and small load times you should just pick individual apis to import instead of everything that's mostly important if you're building a web app because web apps need to be really fast okay so my crop image function it's going to take a computer vision Matrix as input it's going to return a new computer vision Matrix and I want the inputs to all the numbers uh the computer vision Library should have a crop function and if I Google for it it does exist for some reason this particular node.js version does not have it so that's why I'm also going to import the tensorflow library is I'm going to use it to do the actual cropping okay so I'm going to import tensorflow and give it the prefix of TF and import tensorflow tfjs node so inside of my crop I'm going to make a variable for the result of cropping it and I'm first going to convert to tensor from my cb.mat image and this is a function that doesn't exist yet I'm going to Define it so it takes a cv.mat and I want it to return atf.tensor so a tensor is also a matrix it's just a tensorflow matrix instead of a computer vision Matrix and each of these types of matrices have their own apis so the tensorflow one has a slice function which you could think think of as a crop so I'm just going to pass it the coordinates that I want to slice and my image has now been cropped and it probably feels odd that I'm putting y comma X but that's just because the way internally that tensorflow stores its matrices is it does it by rows first and columns second and so that's also why I did height before width so now that my image has been cropped I'm going to convert it back to a computer vision so I'm going to return tensor to CV on my cropped variable and I also need to Define that function bar tensor 2cv equals function image of tensor with output of CV matte okay but these functions are just stubbed so I need to actually code them now so I'm going to execute the tf.tensor function which will create a new tensor for me and I need to pass it my data so CV has a git data function which returns a buffer which is kind of like an array and then I need to Define the format that this array is in so that tensor knows how to interpret it so I'm going to pass it the number of rows in my image and the number of columns that are in my image and then also the number of values that are at each location in my Matrix so it's 3 because I have the colors red green and blue and each of those is represented by a separate number and most images also have an alpha Channel which is for transparency but in my case I'm choosing to ignore that okay so that's how to convert in the One Direction now let's convert the other way so I can do that by calling the cv.mat Constructor function anytime you call a Constructor you need the new keyword and a Constructor is a function inside of a class that sets up the initial variables of the class so the tf.tensor function is almost identical and that it's setting up a tensor but it was not not defined by the API creators as a Constructor and so you don't need the new keyword but internally both of them are just kind of initializing the starting variables for that data type so I'm going to pass it my height and my width and tensor isn't only for matrices but it can be used for matrices and so that's why instead of rows and columns like CV has it uses a shape array and each index in that shape array will tell you the length of that specific axis so in my case since it's a two-dimensional image or a matrix I just have two axes X and Y or like rows and columns so I just have shape zero and shape one so that's basically just telling me the width and the height then I need to tell it what format the data for the image is in so similar to how we told tensorflow that we had three numbers to represent our colors which tensorflow just thinks of them as numbers because it's not an image Library so it doesn't know we're giving it an image it just thinks we're giving it an array of data like a matrix whereas CV even though it's stored as a matrix it's specifically built around images so it knows that those numbers are representing colors so because CV is an image Library there are lots of formats of images for example I mentioned that we're using 32-bit colors well we could also use 16-bit colors which would lower the quality of our picture but would make the file smaller and we could use four numbers if we wanted to have an alpha Channel but in our case we don't want a alpha channel so we're just going to be using three so I need to tell it which image format this data is in and it's important to get this correct because if I told it I only have two colors or that my colors are 16 bit instead of 32 it's going to misunderstand the data and my Tesseract results are going to be completely wrong finally we need to get the data from tensorflow and pass it to the cv.map function so I think I'm going to rename this image variable to tensor because opencv really is designed around images but tensor is really just designed around arbitrary data so I'm just going to rename my variable so it's less confusing and then it's giving a red line here because typescript is designed to make sure I don't pass incorrect data to functions because those would be considered bugs usually but it's misunderstanding what this values what data type format it's in and the cv.mat function expects it to be in a specific format so it's giving me an error saying that I'm passing it data the function is not expecting I can fix that by just typecasting it to any and this isn't actually reformatting the data in any way it's basically just telling it to ignore whatever error it thinks it found so if it is an actual error when I run the code my code might crash but if I'm confident I did it correctly then this is fine to do so now that I have this I just need to put into a variable so I can return it for my function but there's one issue which is I imported this as 32-bit signed which is the format tensorflow wants but actually the format that I want computer vision and Tesseract to use is 8-bit unsigned so I'm going to convert it right here with the convert to function and I'll pass it the 8-Bit unsigned three color parameter because that's the format I want to be using so it might seem wrong that we're using eight bits for the colors but it's actually fine so first of all it makes sense that we use unsigned because we don't need negative numbers for colors and second 8 times 3 is 24 so it's actually 24 bit color the standard is 32-bit color and that's because you normally have an alpha Channel which is your last eight bits but we don't need that in our case and so by storing it in the smallest form possible it's going to make CV a little bit faster when it does its math so essentially we're using eight bits for red and additional 8 Bits for green and an additional 8 Bits for blue and so it's 24 bits in total so the original format that we're reading it in from with tensorflow was tensorflow was thinking of the red as 32 bits by itself and the green as an additional 32 bits so that's actually way more colors than is normal and we don't need to support that and it's going to make our math way slower but that's just the downside of converting back and forth is this get data and tensor functions don't let you specify that you just want eight bits and 32 bits is standard for just a normal variable so it's basically converting each individual color to its own variable instead of keeping all three red green and blue combined as a single color okay so now that we have our crop image function set up let's see if we can read this word Ash out of this image but by cropping it with code instead of doing it by hand in paint one thing I forgot to mention this sink right here on Buffer when I was using Tesseract before I used the async and await commands tensorflow and opencv can also be slow just like Tesseract depending on what function you're calling and so they have options where you can choose to call the synchronous version or the asynchronous version and for simplicity's sake I'm choosing to use the synchronous version because it allows me to write slightly simpler code which is easier to explain but standard practice is if there is an asynchronous version available it's better to use that and to use the async await commands because it will allow your program to multitask and so if you code it that way and you have multiple things that need need to execute at the same time it's possible for it to do that and also it's going to free up the core on the CPU to be able to do other tasks but in my case I'm not going to worry about that small performance impact because I just prefer the simpler code so instead of the cropped PNG I'm going to go back to the other image I was recognizing and instead of recognizing the image file name I want to pass it an image so first I need to load this image into a CV Matrix variable so I'll say let image equals CV dot IM read so that stands for image read I'll pass it the file name and then it wants Flags to tell it which type of format it wants to be read in so I'm going to say I am read underscore color so it'll automatically read the header inside of the PNG to figure out if it's 32-bit or 24-bit and if it has an alpha Channel I don't need to worry about that but this parameter tells it after it's read all the image what format it should keep it in because it's common for computer vision techniques to not care about color and to just read it as grayscale and so rather than requiring you to convert the image after the fact it lets you pass that in as an initial parameter as it's reading so now that I've read the image I want to crop it but I need to know the coordinates I'm going to crop it at so let me open my image and scroll down to where the word Ash is and I'm going to select the rectangle and I want to code this for other words besides just ash because like Alistar is a little bit wider than Ash so I'm going to draw a rectangle on it that's a little bit big after I draw that rectangle at the bottom it says 138 by 22. so that's going to be the width and height I want for my cropping so I won't specify the X and Y at just the width and height 138 and 22. now to get the X and Y I'm just going to put my mouse cursor on the top left corner of that rectangle and you'll notice on the bottom left corner of the screen as I move my mouse it tells me what x and y coordinate I'm at so if I just put it at the top left of that rectangle it says 1290 by 1042 so let me type that in 10 42. okay so now my image is cropped so I'll put that into a variable and I want to pass it to this recognize function but I can't because it expects a file name so I'm going to need to make a new function that takes a file name and so even though this was cropped in memory after I read the file I'm gonna have to save it to a file and that's not ideal it would be better if I could pass the actual memory to Tesseract and if I really needed to optimize the code then I could figure out a way to do that but it's just not the way that this particular node version of Tesseract was designed so although it would be better to actually send it the raw memory it should be fine I have a solid state hard drive so it's going to be almost as fast as RAM and there's a concept in coding called premature optimization and you want to avoid that so although I know this is not the most optimal way to code it it is the easiest way to code it and your time as a programmer is more valuable sometimes than the speed of the computer because having slightly less optimal code might mean it takes an extra half a second to execute something well maybe that's not a big deal it depends on my users and what they're expecting and salaries of programmers are pretty expensive so buying a new computer might be cheaper than paying a programmer to spend 10 hours to recode something a different way so I'm going to make a new function instead of this tesseract.recognize and I'll call it OCR image equals function I'll have to make it async because Tesseract is a sync and instead of passing in a file name I'm going to pass in a computer vision Matrix and instead of having this hard-coded config variable I want that to actually get passed into my function Tesseract config and I'll make it a type any because I don't want to define a data type for this just because it'll be more work than it's worth and then I'm saying equals undefined to give it a default so that I can still make this config here the default I'll just move it and rename it to default test select config so that way in the situation where I'm trying to read Ash I know it's only alphabetical so I can pass in that dictionary but when I'm trying to read in Gold I can pass in a dictionary of just numbers so that's what this config is going to be used for and then finally I'm going to have it return a promise of string just like the original Tesseract so I'm going to write an image with CV to a file so this image parameter will get passed in and then also the file name so I'll just save it to like see Temp image.png and I'll put that into a variable so that I can also pass it to Tesseract so temp image file name equals past the variable in and then after I've saved the file I can just move this Tesseract code up to right here after the file has been created so I'll pass in tesseracts config that was applied to my function but if it's undefined I'm going to use the or command to override it from undefined to this default and instead of the crop image I'm going to use my file name and then I'll just return this so now I have my OCR image function I can call that inside my test function on the cropped variable and for now I'll just leave the config blank so I won't even pass in a second parameter I could say comma undefined but if I just leave it off it'll do the same thing because I gave it a default of equals undefined so I think that's all the code changes I need to make I'm expecting that when I run it I should get the word Ash returned so let's try it out npx TS node index.ts okay I got an exception so it looks like I did not configure opencv all the way I forgot a step so in my files if I look at my package Json there are our settings I need to put in here to say where opencv is installed to on my computer so I mentioned that you need to go to their website to install it and I have already installed it but I did not set up the configuration so on my computer install to the C drive tools opencv build so in package Json and you could copy this from their documentation online but you need to take the name of the library opencv node.js and make it a new key in your Json file and then you need to make three parameters called include dur and lib dir and Bender and each of those need to have the correct folder specified and these all need a prefix called open CV is so my Bender is build slash bin and my libder is build x64 vc14 lib and my bin is just one level up from there and then into the bin folder and I mentioned before that anytime you use a backslash and a string in JavaScript and this is a Json file so it's similar to JavaScript um because backslash is an escape character you need to use it twice to remove the Escape character and just make it a normal backslash oh I forgot one more parameter disable auto build one so one meaning true so what happens is the opencv node the first time it runs it's going to try to compile the code but in order for it to compile it needs additional compiler tools installed and I don't want to set that up in particular because a lot of times these packages were designed to be run on Linux not Windows and so setting up those compiler tools will be a little bit of extra work to hack it to make it work with Windows and so I'm just going to take the pre-compiled binaries so that it doesn't need to compile it on the Fly and so that's why I had to download it online so now that I've got that set up I expect it to be able to run so run one more time fingers crossed okay so it says that it's outputting a promise so this OCR image I forgot because I made it an async I'm actually outputting the fact that it's waiting for the response so that's where I mentioned before if I'm going to use the promise I need to call then and make an inner callback function where the result gets sent but instead of doing that I just need to add the await command here that I forgot so run it one more time now I have the word Ash perfect so now instead of me having to crop it by hand it's being cropped directly in code and then it's able to read the word and if you remember I ran an analysis on this image directly with Tesseract without cropping and it was not able to find the word Ash because all of the other Graphics we're confusing it but by cropping just to the spot where the word is it's able to read it just fine so now we have a framework that we can read any of the UI pieces that we want we just need to get a list of all the different coordinates and create a data structure to easily store those coordinates and then just Loop over the coordinates to read each one out to the screen so that'll just be our last little piece okay so first I'm going to set up different dictionaries for Tesseract so that the gold can use just numbers and the champion names can use just letters so first I'll make a variable that has a list of all of the alphabet characters and I just typed the uppercase ones so I'll make a separate one for the lowercase ones but rather than retyping them I'll just use the two lowercase function and then I'll make one that's for both so I'll say Alpha equals lower plus upper to concatenate the two and then I'll make one one for just numbers and I've already pre-typed that here and then I'll make one for Alpha and numbers so I'll just use these constant variables as default dictionaries depending on which control I'm using and then if any control happens to use different characters like dashes or slashes then I'll just Define a custom dictionary for that specific control so I'll make the default be alphanumeric next I want to define a list of all the controls that are on my screen so I'm going to make some data types to represent locations on the screen the question mark signifies that it's nullable meaning that it can be undefined so I might have some UI elements where I don't know the location ahead of time for example if I'm searching for an item on the screen I might know the width and height of the item and the image of what it looks like but I might not necessarily know the location so that's why I'm making it undefined and in order to make my data types more reusable rather than making a type that is the entire UI element I'm just doing a coordinate by itself as kind of like a base class and then I'll do a separate class or data type for like the width and height and other parts so for example I can say a rectangle has a coordinate of X and Y but in addition it has a width and height and then I can say a UI element is a rectangle but it also has an image which is a computer vision Matrix and maybe the text that I want to search for within that rectangle so the image would be if it's an icon for a weapon and finally if it needs a custom dictionary for the OCR to recognize so now that I have my data type set up I can make a list of controls which will be the locations on the screen that I want to scrape so I'm going to Define it as a JavaScript dictionary type or a hash table so the key will be a string and the value will be an UI element so for example the player's gold has a specific X location Y location width and height and then the dictionary it will use will just be numeric so what I'm going to do is open up this image in a paint program and just like I did before to find the coordinates for ash I'll do the same thing for each of the pieces on the screen and I'll copy the coordinates into the code so I think the pieces that I'm going to want is right here which is the player's level right here which is how much experience they have until they gain their next level up here at the top says which round of battle that they're in and also here at the top it says how many seconds are left in the current battle and then down here it shows how many wins or losses they have in a row and then of course we have all the character names everything else that's on screen I think I can ignore for now I will also need to scrape data out of images themselves which I'll show in a separate video for example which items I have available to equip to my characters so now that I've got the coordinates from the paint program I'm just going to type them in real quick okay they're all typed so the gold has these coordinates and we'll use numeric the experience will use these coordinates and in addition to numeric it will have a slash symbol because you'll notice that it's 4 out of 24 20 for my current experience and then the level has these coordinates and will also be numeric the battle round which is at the top here the 2-3 it'll be numeric with a dash the timer has these coordinates and will be numeric and the streak count will have these coordinates and numeric so that will work for all the numeric ones next I need to do the ones for the names of the heroes available to purchase in my deck so I'll type those in real quick okay I've got them all typed in now so the Y width and height will be the same on all five cards the only difference is the X will just be a couple hundred pixels shifted to the right for each one and they'll all use the alpha for the dictionary so I've got card Choice one through five so now that my coordinates are all typed in instead of calling this OCR image on the hard-coded coordinates that I'm using for cropping which I was using to read for ash instead I need to Loop over my entire controls dictionary and crop to the specific coordinates of that control and then output that to the screen so to do that I'm just going to do a for Loop that Loops over each key in the dictionary and from each key I'll pull out the value and then I'll call the same crop and OCR code I was calling before I'll just put it inside of the for Loop and instead of part of coding the coordinates for cropping I'll use the element from the dictionary so it'll use the x value y value the width and the height so now the image has been cropped and then I'm going to OCR it but when I OCR it I also need to send the OCR dictionary and then I will log the name of the control along with the result from the OCR right here where I pass in the dictionary to OCR image I should actually be merging the default object with my new object a library that I like to use for just general JavaScript utility function is called Low Dash so I'm going to say npm install low Dash so I'm going to import that from low Dash and if I were now to type underscore dot it looks like the typings didn't come in let me also install the typings so if I do npm install at sign types slash low Dash that should get the type information and I probably should do the same for node which is going to have some JavaScript utilities that are specific for making applications outside of the browser part of the node framework but okay that's better now I have all these different utility functions that I can use they were available to me even before I installed the types package for lodash but the types package gives me autocomplete as well as the parameters that each function needs and documentation so it's going to be much easier to code with but all of these different utility functions I just find incredibly helpful for every project I work on because it kind of adds additional functional Concepts to JavaScript and I really like functional programming what I can do with this dictionary now is I can say underscore.merge and I can pass in an empty object and then I can pass in the default config that I want and what that's going to do is it's going to take this object and essentially clone it into that empty object that I created by copying each value you one at a time that way calling OCR image multiple times I don't somehow accidentally mess up my original config setting and then I'm going to tell it the one property I want to override which is the Tesseract character whitelist property and that's where I'm going to pass in the OCR dictionary from this custom element that I'm searching for so this line got a little bit long so I think I'll copy that into its own variable so I'll say let config equals and then pass config in okay I think that will fix the bug so let's rerun it MPX TS node index okay I think that looks a lot better so now the gold is reading at 5 XP 4120 level five okay this battle round is still wrong it should be 2-3 and it's saying 23 but the timer looks correct the streak account looks correct all the character names showed up so this looks great so as I mentioned before for examples like the experience the 4120 or the one should be a slash and in the case of the battle round 23 should be 2-3 I'll have to code something specific just to fix those specific ones but otherwise they should work and another change I might want to make is right here I'm passing in the X Y width and height but I got those coordinates from a screenshot I took at 1920 by 1080. so I may want to do something like this if I make a default width of 1920 and a default height of 1080 and then maybe I have like a current width let's say the user is on so like a 2K monitor is uh 2560 by 1440. so if there happened to be playing on a bigger monitor these coordinates are going to be wrong because everything's going to be scaled so what I can do is calculate a scaling Factor so scale factor width equals and I would just divide the actual width from the width that all my coordinates were calculated off of my screenshots and then I can do the same thing for my height and divide the actual height from the default height and then I can use these scale factors and multiply the X and the width and the Y and the height so that should translate so that if someone's playing on a different screen size the coordinates should still work and one last fix I could do is when I'm specifying the dictionary for my card choices I'm just saying that I want letters and not numbers but I'm not saying what specific words I should be expecting so take for example Alistar if his real character name was like a i i s t a r will look at the letter L and the letter i they're quite similar it's theoretically possible that Tesseract could misread just one letter of that character so essentially I can make an additional function which finds the closest match to a correct character name and corrects it if it misreads just one letter so the way I would code something like that is there is an algorithm called Levenstein so I'm going to do an npm install fast Dash 11 Stein and what this algorithm does is it takes two words and it analyzes the number of keystrokes that would be required to convert from one to the other so in this case to convert from this to Alistar I would basically just have to highlight the one incorrect letter and change it from an i to an L so essentially saying that there's one letter that's incorrect and if I count the number of letters here one two three four five six seven that basically means there was one incorrect out of seven or I could say six out of seven were correct so if I pulled up a calculator and said six out of seven that's an 85 percent score so that would be close enough that I would consider them the same and so I'd be okay to do that spelling correction so the API for Levenstein is pretty simple to code I'm just going to do import Star as levengestein from Fast Levenstein and then I'm going to call 11 Stein dot get and I'll pass it my two words so I'll do Alistar spelled correctly and then I'll do it spelled incorrectly and I'm going to comment out my test code for now so that it doesn't run and I'll just output to the screen with a console.log what this Levin Stein comes up with MPX TS node index okay so it says the number one which as I said before it means that there's one character that's off so if I divide that by the correct word.length then it should give me the 85 percent that I was mentioning before um okay so give me the opposite which is 15 so I would want to just do one minus this because I actually did one out of seven instead of six out of seven so what I could do next next is I could pull up a website called tfts.gg and this just lists a whole bunch of data about team fight tactics and I could go into database Champions and I could take each Champion name so I've got aatrox Alistar and I could do each one and make an array of Champions and then I could make a function called something like pick closest which takes text which is like kind of like the compare text and then the dictionary which is an array of strings and then a threshold which is a number and maybe I'll default that to say that it has to be at least 25 correct and then I could just Loop over each champion in the array and call this Levenstein function to compare the text from the dictionary to the text that Tesseract found now this is kind of an imperative style of coding which is how generally programmers start coding at first but I prefer functional style I think it's a little bit better because the code is shorter and more reusable so if I were to code this the functional way I would instead take my dictionary and call the map function on it and that's essentially the same as a for Loop but it's going to create a new array that contains the results of calling a function on each value of the array so because it's a function the first part of the function is your parameter so I'm just going to call it X to signify the current value that I'm looping over and then you use the arrow function for just like an inline Anonymous function it's called the Lambda you can use the longer syntax of function X curly brace but I just think the Lambda syntax is a little shorter I'm just going to call this similar logic but I'm not going to do any if statements I'm just going to run this calculation oh and I forgot this should not say Alistar here this should be the length of the text I'm comparing and I think it probably should be like math dot Max of the compare length and the text length so essentially if one of them is five characters and the other seven characters it's going to use the larger of the two the seven and use that as the denominator when determining what percentage you got right so anyways I'm just going to take this formula and put it inside of my map function so normally with the Lambda you don't need curly braces but I want to return an object and objects need curly braces and it gets confused if you have double curly braces so I normally you also don't need the return keyword so if the map was just going to like convert everything in the dictionary to the value 5 then this essentially just says return five even though I didn't actually use the word return so here I'm going to have to do a slightly more verbose syntax but I'll say that the text is X and the threshold match is or like the match percentage is this value and I'll use x which was the input to my map function so it's the value of the current Loop I'm in the dictionary that I'm iterating over so I'll use that instead of the text variable that I was using before so that's what the this map function does is it's going to call this formula on every single value in the array Okay so so that Maps it but next I need to handle this if statement so after the map I'm going to chain it with a DOT filter and because my line of code is getting too long I'm just going to hit the enter key to put this filter on the next line below but I'm going to filter to say that I only want to keep values from this dictionary that are greater than or equal to the threshold so that's essentially doing what this if statement was doing except this x is a text and a match percentage so I need to check the match percentage against the threshold and then finally I'm going to use the underscore dot order by function and pass in this entire result so and that's the beauty of functional coding is it's designed so that you can chain things together so it's much shorter and reusable compared to imperative code with like normal for loops and variables so I'm going to do an order by and the reason is to sort it so that I get just the best one so outside of the filter I'm going to do a Lambda so again I'll use x as my parameter and then I'll say x dot match percentage so that's going to sort this entire result based on the match percentage and by default it's going to be ascending meaning that the smallest match percentage will be first but afterwards I'm just going to want to grab the first one which is the best and I'm going to want to return it so this really should be in order by descending instead of ascending so I can do that by just timesing this match percentage by negative one so I'll say that this is my result and I'll return result dot text so now I can delete this for Loop code so it's kind of up to you whether you prefer the imperative style or the functional style functional takes a while to get used to because it looks a lot more complicated in the beginning but once you get used to it I really like it so that's essentially how how I could create a dictionary of champion names and make sure I get the best one so the only other two things that I should scrape would be in this unit overlay image here I've clicked on a character and it pops up and it shows who the character is and all their attributes the reason that might be useful is if I get a free unit from picking up a bonus after killing a monster and that would be a unit that I didn't choose to purchase from the carts down here so I could technically use opencv to analyze the 3D model to see who this character is but that's going to be much more difficult than just reading the text of their name and it's not too hard for me to just right click on the character to get this pop-up to show so that I can analyze the text so that's the approach I plan to use I haven't showed how to code that but it's going to be almost identical to the other Tesseract code just putting in screen coordinates the only difference is that this pop-up is going to be at a place that it's anchored to where the 3D model is so it's not going to have a fixed x and y coordinate so I'll need to think of where I clicked and then do an offset from where I clicked and the final one would be in the battle rounds it calculates damage while they're fighting so you can see on the right side here it shows each of my characters and how much damage they in this case block but if I click these icons down here there's also how much damage they dealt and how much damage they healed or shielded and that will be useful for me to figure out which of my characters is the strongest and I probably won't use that to make a change in my strategy during the currents game but I'll track those statistics and save them into a database so that after I've played a few hundred games I can use that to analyze statistics on which Champions do the best in certain combinations and so that I can use that for planning for future games and deciding which characters to purchase again that's something that can partially be done with Tesseract just to read the number but then I'll need to do image scraping with opencv to figure out which character it is based on this icon so I'll show the image scraping in the next video by the Tesseract portion I've already demonstrated here so I'm not going to code every single example since they're all very similar but now you should have a general overview of how Tesseract works so you can incorporate it into your projects so that's everything I wanted to demonstrate hopefully you found this video useful please let me know what feedback you have on how I could improve my code and thanks for watching

Popular posts from this blog

AoE4 Mod Tutorial: Making a Crafted Map in the Content Editor

TeamFightTactics - Rules & Strategy - Coding an AI - Part 2

Intro to JavaScript - Coding a TFT AI - Part 3