metaphor in machine learning
Deep learning is a
machine learning technique that teaches computers to do what comes
naturally to humans: learn by example. Deep learning is a key technology
behind driverless cars, enabling them to recognize a stop sign, or to
distinguish a pedestrian from a lamppost. It is the key to voice control
in consumer devices like phones, tablets, TVs, and hands-free speakers.
Deep learning is getting lots of attention lately and for good reason.
It’s achieving results that were not possible before.
In deep learning, a
computer model learns to perform classification tasks directly from
images, text, or sound. Deep learning models can achieve
state-of-the-art accuracy, sometimes exceeding human-level performance.
Models are trained by using a large set of labeled data and neural
network architectures that contain many layers.
1. Explanations
When people
think about artificial intelligence, they typically seem to have in mind
the first kind of explanation. The expectation is that the system made a
deliberation and chose a course of action based on the expected
outcome. Although there are cases where this is possible, increasingly
we are seeing a move towards systems that are more similar to the second
case; that is, they receive stimuli and then they just react.
There
are very good reasons for this (not least because the world is
complicated), but it does mean that it’s harder to understand the reasons
for why a particular decision was made, or why we ended up with one
model as opposed to another. With that in mind, lets dig into what we
mean by a model, and the metaphor of the black box.
2. Boxes and Models
The
black box metaphor dates back to the early days of cybernetics and
behaviourism, and typically refers to a system for which we can only
observe the inputs and outputs, but not the internal workings. Indeed,
this was the way in which B. F. Skinner conceptualized minds in general.
Although he successfully demonstrated how certain learned behaviours
could be explained by a reinforcement signal which linked certain inputs
to certain outputs, he then famously made the mistake of thinking that
this theory could easily explain all of human behaviour, including language.
As
a simpler example of a black box, consider a thought experiment from
Skinner: you are given a box with a set of inputs (switches and buttons)
and a set of outputs (lights which are either on or off). By
manipulating the inputs, you are able to observe the corresponding
outputs, but you cannot look inside to see how the box works. In the
simplest case, such as a light switch in a room, it is easy to determine
with great confidence that the switch controls the light level. For a
sufficiently complex system, however, it may be effectively impossible
to determine how the box works by just trying various combinations.
Now imagine that you are allowed to open up the box and look inside. You are even given a full wiring diagram, showing what all the components are, and how they are connected. Moreover, none of the components are complex in and of themselves; everything is built up from simple components such as resistors and capacitors, each of which has behaviour that is well understood in isolation. Now, not only do you have access to the full specification of all the components in the system, you can even run experiments to see how each of the various components responds to particular inputs.
You
might think that with all this information in hand, you would now be in a
position to give a good explanation of how the box works. After all,
each individual component is understood, and there is no hidden
information. Unfortunately, complexity arises from the interaction of
many simple components. For a sufficiently complex system, it is
unlikely you’d be able to predict what the output of the box will be for
a given input, without running the experiment to find out. The only
explanation for why the box is doing what it does is that all of the
components are following the rules that govern their individual
behaviour, and the overall behaviour emerges from their interactions.
Even more importantly, beyond the how of the system, you would likely be at a loss to explain why each component had been placed where it is, even if you knew the overall purpose of the system. Given that the box was designed
for some purpose, we assume that each component was added for a reason.
For a particularly clever system, however, each component might end up
taking on multiple roles, as in the case of DNA. Although this can lead
to a very efficient system, it also makes it very difficult to even
think about summarizing the purpose of each component. In other words, the how of the system is completely transparent, but the why is potentially unfathomable.
This,
as it turns out, is a perfect metaphor for deep learning. In general,
the entire system is open to inspection. Moreover, it is entirely made
up of simple components that are easily understood in isolation. Even if
we know the purpose of the overall system, however, there is not
necessarily a simple explanation we can offer as to how
the system works, other than the fact that each individual component
operates according to its own rules, in response to the input. This,
indeed, is the true explanation of how they system works, and it is
entirely transparent. The tougher question of course is why each component has taken on the role that it has. To understand this further, it will be helpful to separate the idea of a model from the algorithm used to train it.
3. Models and Algorithms
To
really get into the details, we need to be a bit more precise about
what we’re talking about. Harris refers to “how the algorithm is
accomplishing what its accomplishing”, but there are really two parts
here: a model — such a deep learning system — and a learning
algorithm — which we use to fit the model to data. When Harris refers to
“the algorithm”, he is presumably talking about the model, not
necessarily how it was trained.
What exactly do we mean by a model? Although perhaps somewhat vague, a statistical model
basically captures the assumptions we make about how things work in the
world, with details to be learned from data. In particular, a model
specifies what the inputs are, what the outputs are, and typically how
we think the inputs might interact with each other in generating the
output.
A
classic example of a model is the equations which govern Newtonian
gravity. The model states that the output (the force of gravity between
two objects) is determined by three input values: the mass of the first
object, the mass of the second object, and the distance between them.
More precisely, it states that gravity will be proportional to the
product of the two masses, divided by the distance squared. Critically,
it doesn’t explain why these
factors should be the factors that influence gravity; it merely tries to
provide a parsimonious explanation that allows us to predict gravity
for any situation.
Of course, even if this were completely correct, in order to be able to make a prediction, we also need to know the corresponding scaling factor, G. In
principle, however, it should be possible to learn this value through
observation. If we have assumed the correct (or close to correct) model
for how things operate in reality, we have a good chance of being able
to learn the relevant details from data.
In
the case of gravity of course, Einstein eventually showed that Newton’s
model was only approximately correct, and that it fails in extreme
conditions. For most circumstances, however, the Newtonian model is good
enough, which is why people were able to learn the constant G= 6.674×10^(−11) N · (m/kg)², and use it to make predictions.
Einstein’s
model is much more complex, with more details to be learned through
observation. In most circumstances, it gives approximately the same
prediction as the Newtonian model would, but it is more accurate in
extreme circumstances, and of course has been essential in the
development of technologies such as GPS. Even more impressively, the
secondary predictions of relativity have been astounding, successfully
predicting, for example, the existence of black holes before we could
ever hope to test for their existence. And yet we know that Einstein’s
model too, is not completely correct, as it fails to agree with the
models of quantum mechanics under even more extreme conditions.
Gravitation,
of course, is deterministic (as far as we know). In machine learning
and statistics, by contrast, we are typically dealing with models that
involve uncertainty or randomness. For example, a simple model of how
long you are going to live would be to just predict the average of the
population for the country in which you live. A better model might take
into account relevant factors, such as your current health status, your
genes, how much you exercise, whether or not you smoke cigarettes, etc.
In pretty much every case, however, there will be some uncertainty about
the prediction, because we don’t know all the relevant factors. (This
is different of course from the apparent true randomness which occurs at
the sub-atomic level, but we won’t worry about that difference here).
In
addition to being an incredibly successful rebranding of neural
networks and machine learning (itself arguably a rather successful
rebranding of statistics), the term deep learning
refers to a particular type of model, one in which the outputs are the
results of a series of many simple transformations applied to the inputs
(much like our wiring diagram from above). Although deep learning
models are certainly complex, they are not black boxes. In fact, it
would be more accurate to refer to them as glass boxes, because we can
literally look inside and see what each component is doing.
The
problem, of course, is that these systems are also complicated. If I
give you a simple set of rules to follow in order to make a prediction,
as long as there aren’t too many rules and the rules themselves are
simple, you could pretty easily figure out the full set of
input-to-output mappings in your mind. This is also true, though to a
lesser extent, with a class of models known as linear models, where the effect of changing any one input can be interpreted without knowing about the value of other inputs.
Deep
learning models, by contrast, typically involve non-linearities and
interactions between inputs, which means that not only is there no
simple mapping from input to outputs, the effect of changing one input
may dependent critically on the values of other inputs. This makes it
very hard to mentally figure out what’s happening, but the details are
nevertheless transparent and completely open to inspection.
The
actual computation performed by these models in making a prediction is
typically quite straightforward; where things get difficult is in the
actual learning of the model parameters from data. As described above,
once we assume a certain form for a model (in this case, a flexible
neural network); we then need to try to figure out good values for the
parameters from data.
In
the example of gravity, once we have assumed a “good enough” model
(proportional to mass and inversely proportional to distance squared),
we just need to resolve the value of one parameter (G), by fitting the model to observations. With modern deep learning systems, by contrast, there can easily be millions of such parameters to be learned.
In
practice, nearly all of these deep learning models are trained using
some variant of an algorithm called stochastic gradient descent (SGD),
which takes random samples from the training data, and gradually adjusts
all parameters to make the predicted output more like what we want.
Exactly why it works as well as it does is still not well understood,
but the main thing to keep in mind is that it, too, is transparent.
Because
it is usually initialized with random values for all parameters, SGD
can lead to different parameters each time we run it. The algorithm
itself, however, is deterministic, and if we used the same
initialization and the same data, it would produce the same result. In
other words, neither the model nor the algorithm is a black box.
Although
it is somewhat unsatisfying, the complete answer to why a machine
learning system did something ultimately lies in the combination of the
assumptions we made in designing model, the data it was trained on, and
various decisions made about how to learn the parameters, including the
randomness in the initialization.
4. Back to black boxes
Why does all this matter? Well, there are at least two ways in which the concept of black boxes are highly relevant to machine learning.
First, there are plenty of algorithms and software systems (and not just those based on machine learning) that are
black boxes as far as the user is concerned. This is perhaps most
commonly the case in proprietary software, where the user doesn’t have
access to the inner workings, and all we get to see are the inputs and
outputs. This is the sort of system that ProPublica reported on
in it’s coverage of judicial sentencing algorithms (specifically the
COMPAS system from Northpointe). In that case, we know the inputs, and
we can see the risk scores that have given to people as the output. We
don’t, however, have access to the algorithm used by the company, or the
data it was trained on. Nevertheless, it is safe to say that someone has access to the details — presumably the employees of the company — and it is very likely completely transparent to them.
The
second way in which the metaphor of black boxes is relevant is with
respect to they systems we are tying to learn, such as human vision. In
some ways, human behaviour is unusually transparent, in that we can
actually ask people why they did something, and obtain explanations.
However, there is good reason to believe that we don’t always know the
true reasons for the things we do. Far from being transparent to
ourselves, we simply don’t have conscious access to many of the internal
processes that govern our behaviour. If asked to explain why we did
something, we may be able to provide a narrative that at least conveys
how the decision making process felt to us. If asked to explain how we
are able to recognize objects, by contrast, we might think we can
provide some sort of explanation (something involving edges and
colours), but in reality, this process operates well below the level of
consciousness.
Although
there are special circumstances in which we can actual inspect the
inner workings human or other mammalian systems, such as neuroscience
experiments, in general, we are trying to use machine learning to mimc
human behaviour using only the inputs and the outputs. In other words,
from the perspective of a machine learning system, the human is the black box.
6. Conclusion
In conclusion, it’s useful to reflect on what people want when they think of systems that are not
black boxes. People typically imagine something like the scenario in
which a self-driving car has gone off the road, and we want to know why.
In the popular imagination, the expectation seems to be that the car
must have evaluated possible outcomes, assigned them probabilities, and
chose the one with the best chance of maximizing some better outcome,
where better is determined according to some sort of morality that has
been programmed into it.
In
reality, it is highly unlikely that this is how things will work.
Rather, if we ask the car why it did what it did, the answer will be
that it applied a transparent and deterministic computation using the
values of its parameters, given its current input, and this determined
its actions. If we ask why it had those particular parameters, the
answer will be that they are the result of the model that was chosen,
the data it was trained on, and the details of the learning algorithm
that was used.
This
does seem frustratingly unhelpful, and it is easy to see why people
reach for the black box metaphor. However, consider, that we don’t
actually have this kind of access for the systems we are trying to
mimic. If we ask a human driver why they went off the road, they will
likely be capable of responding in language, and giving some account of
themselves — that they were drunk, or distracted, or had to swerve, or
were blinded by the weather — and yet aside from providing some sort of
narrative coherence, we don’t really know why they did it, and neither
do they. At least with machine learning, we can recreate the same
setting and probe the internal state. It might be complicated to
understand, but it is not a black box.
Block Diagram
What is a Block Diagram?
A block diagram is a specialized, high-level flowchart used in engineering. It is used to design new systems or to describe and improve existing ones. Its structure provides a high-level overview of major system components, key process participants, and important working relationships.Types and Uses of Block Diagrams
A block diagram provides a quick, high-level view of a system to rapidly identify points of interest or trouble spots. Because of its high-level perspective, it may not offer the level of detail required for more comprehensive planning or implementation. A block diagram will not show every wire and switch in detail, that's the job of a circuit diagram.A block diagram is especially focused on the input and output of a system. It cares less about what happens getting from input to output. This principle is referred to as black box in engineering. Either the parts that get us from input to output are not known or they are not important.
How to Make a Block Diagram
Block diagrams are made similar to flowcharts. You will want to create blocks, often represented by rectangular shapes, that represent important points of interest in the system from input to output. Lines connecting the blocks will show the relationship between these components.In SmartDraw, you'll want to start with a block diagram template that already has the relevant library of block diagram shapes docked. Adding, moving, and deleting shapes is easy in just a few key strokes or drag-and-drop. SmartDraw's block diagram tool will help build your diagram automatically.
Symbols Used in Block Diagrams
Block diagrams use very basic geometric shapes: boxes and circles. The principal parts and functions are represented by blocks connected by straight and segmented lines illustrating relationships.When block diagrams are used in electrical engineering, the arrows connecting components represent the direction of signal flow through the system.
Whatever any specific block represents should be written on the inside of that block.
A block diagram can also be drawn in increasing detail if analysis requires it. Feel free to add as little or as much detail as you want using more specific electrical schematic symbols.
Block Diagram: Best Practices
- Identify the system. Determine the system to be illustrated. Define components, inputs, and outputs.
- Create and label the diagram. Add a symbol for each component of the system, connecting them with arrows to indicate flow. Also, label each block so that it is easily identified.
- Indicate input and output. Label the input that activates a block, and label that output that ends the block.
- Verify accuracy. Consult with all stakeholders to verify accuracy.
Block Diagram Examples
The best way to understand block diagrams is to look at some examples of block diagrams.Block Diagram Score--Board
“For example, we make everything smaller. The physical accuracy of the models hasn't changed, but we're entering regimes where there's increasing cross talks between components simply because we're packing them together more closely.”
new product demands, such as the push for greener technologies – which calls for ever-improving energy minimization – are creating an environment for design engineers in which simulation-based verification alone is simply not practical. “When you're designing a product, such as, say, a cellphone, you have maybe about a hundred or so components on the circuit board. That's a lot of design values. To completely explore that design space and try every possible combination of components is unfeasible.
“When we want to do design optimization we can't be concerned with every single variable inside the system,” “All we really care about is what's happening in the aggregate – the signals at the outside of the device where the humans are interacting with it. So we want these abstracted models and that's what machine learning gives you – models that you then use for simulation.”
Accomplishing this is no small task, given that simulations require engineers to model everything in a system, and all of those effects can be represented , and completely data-driven modeling, not based on any prior knowledge of what's inside the system. To do this they need to use machine learning algorithms to that can predict a particular output and represent the behaviors of particular components.
machine learning-based modeling also offers several other benefits that should be attractive to companies, such as the ability to share models without revealing vital intellectual property (IP). “Because behavior modeling only describes, say input/output characteristics, they don't tell you what's inside the black box. They preserve or obscure IP. With a behavioral model a supplier can easily share that model with their customer without disclosing proprietary information .
It allows for the free flow of critical information and it allows the customer then to be able to design their system using that model from the supplier.”
Most integrated circuit manufacturers, for example, use Input/Output Buffer Information Specification (IBIS) models to share information about input/output (I/O) signals with customers, while also protecting IP. that IBIS models tell you absolutely nothing about the circuit design details.
“Where machine learning can help is to make models such as IBIS better . “IBIS models don't represent interactions between the multiple I/O pins of an integrated circuit. There's a lot of unintended coupling that current models can't replicate. But with more powerful methods based on machine learning for obtaining models, next-gen models may be able to capture those important effects.” The other great benefit would be reduced time to market. In the current state of circuit design there's almost a sense of planned failure that eats up a lot of development time. “Many chips don't pass qualification testing and need to undergo a re-spin,” “With better models we can get designs right the first time.”. A background in system level ESD, a world she said is built on trial and error and would benefit greatly from behavioral modeling. “[Design engineers] make a product, say a laptop, it undergoes testing, probably fails, then they start sticking additional components on the circuit board until it passes...and it wastes a lot of time,” We build in time to fix things, but it's often by the seat of one's pants. If we had accurate models for how these systems would respond to ESD we could design them to pass qualification testing the first time.” The willingness and interest in machine learning-based behavioral models is there, but the hurdles are in the details. How do you actually do this? Today, machine learning finds itself being largely applied to image recognition, natural language processing, and, perhaps most ignominiously, the sort of behavior prediction that lets Google guess what ads it wants to serve you. “There's only been a little bit of work in regards to electronics modeling . “We have to figure out all the details. We're working with real measurement data. How much do you need? Do you need to process or filter it before delivering it to the algorithm? And which algorithms are suitable for representing electronic components and systems? We have to answer all of those questions.”
CAEML's aim is to demonstrate, over a five-year period, that machine learning can be applied to modeling for many different applications within the realm of electronics design. As part of that the center will be doing foundational research on the actual machine learning on the algorithms – identifying ones that are most suitable and how to use them.
“Although we're working on many applications – signal integrity analysis, IP reuse, power delivery network design, even IC layouts and physical design – all of which require models, there are common problems that we're facing, a lot of them do pertain to working with a limited set of real measurement data,” machine learning theorists really only focused on the algorithm. They assumed there's an unlimited quantity of data available, and that's not realistic, at least in our domain. In order to get data you have to fabricate samples and measure them, which that's takes time and money. The amount of data, though it seems huge to us, is very small compared to what they use in the field. “ .
XO___XO Electronics and Artificial Intelligence
Through Artificial Intelligence, ITCL aims for the design and implementation of processes that, when run on physical architectures, trigger the maximisation of certain results by means of the actions they set off, always focusing on productivity enhancement.
The area has a long history of working
with companies throughout the national territory, having participated in
a number of projects, both R&D and process improvement,
highlighting their developments in areas of application where
intelligent systems, communication Advanced and microelectronics are key
factors (Smart Cities, Smart Energy, Industrial Internet of Things –
IIoT, Industry 4.0, Factories of the Future – FoF, Machine-to-Machine
Technology- M2M).
lines of work of the research group
Some of the lines of work of the research group in electronics and artificial intelligence are the following:
- Design and prototyping of electronic boards and advanced devices for integration in equipment and telemedicine
- ARM Solution Design
- Programming embedded systems under Linux
- Programming of microprocessors for data acquisition, control and advanced communications.
- Data analysis in intelligent systems: Design of custom algorithms and design of experiments. Feature selection, feature-based process optimization, dynamic data series clustering, output model prediction.
- Specific developments for sustainable mobility: domestic and collective electric vehicle charging systems, demand analysis of loads, energy distribution according to constraints, localization and control systems, carpooling, carsharing, route optimization.
- Design of intelligent systems for precision agriculture, infrastructures and cities.
Technology samples
- Microcontrollers: several architectures (ARM, RISC) and manufacturers (Microchip, Atmel, Freescale, Raspberry).
- Wireless Communication: ZigBee, Bluetooth v2.0 y v4.0, radio-frequency (434 MHz), Wi-Fi, GPRS, RFID Mifare.
- Communication Protocols: RS-485, RS-232, I2C, SPI, bus CAN, PLC, UART, TCP/IP.
- Positioning: GPS, GPRS.
- Sensors: temperature (thermocouple, PT100, RTD), light sensors, load cells, movement sensors (accelerometers, gyroscopes), biosensors, potential-free contacts, touch sensors, skin conductivity, infrared sensors, etc.
- HMI (Human Machine Interface): light and acoustic indicators, displays (LCD, graphic, alphanumeric, etc.), keyboards.
XO___XO ++ DW DEEP LEARNING
How does deep learning attain such impressive results?
In a word, accuracy. Deep learning achieves recognition accuracy at higher levels than ever before. This helps consumer electronics meet user expectations, and it is crucial for safety-critical applications like driverless cars. Recent advances in deep learning have improved to the point where deep learning outperforms humans in some tasks like classifying objects in images.While deep learning was first theorized in the 1980s, there are two main reasons it has only recently become useful:
- Deep learning requires large amounts of labeled data. For example, driverless car development requires millions of images and thousands of hours of video.
- Deep learning requires substantial computing power. High-performance GPUs have a parallel architecture that is efficient for deep learning. When combined with clusters or cloud computing, this enables development teams to reduce training time for a deep learning network from weeks to hours or less.
Examples of Deep Learning at Work
Deep learning applications are used in industries from automated driving to medical devices.Automated Driving: Automotive researchers are using deep learning to automatically detect objects such as stop signs and traffic lights. In addition, deep learning is used to detect pedestrians, which helps decrease accidents.
Aerospace and Defense: Deep learning is used to identify objects from satellites that locate areas of interest, and identify safe or unsafe zones for troops.
Medical Research: Cancer researchers are using deep learning to automatically detect cancer cells. Teams at UCLA built an advanced microscope that yields a high-dimensional data set used to train a deep learning application to accurately identify cancer cells.
Industrial Automation: Deep learning is helping to improve worker safety around heavy machinery by automatically detecting when people or objects are within an unsafe distance of machines.
Electronics: Deep learning is being used in automated hearing and speech translation. For example, home assistance devices that respond to your voice and know your preferences are powered by deep learning applications.
How Deep Learning Works
Most deep learning methods use neural network architectures, which is why deep learning models are often referred to as deep neural networks.
The term “deep” usually refers to the number of hidden layers in the neural network. Traditional neural networks only contain 2-3 hidden layers, while deep networks can have as many as 150.
Deep learning models are trained by using large sets of labeled data and neural network architectures that learn features directly from the data without the need for manual feature extraction.
The term “deep” usually refers to the number of hidden layers in the neural network. Traditional neural networks only contain 2-3 hidden layers, while deep networks can have as many as 150.
Deep learning models are trained by using large sets of labeled data and neural network architectures that learn features directly from the data without the need for manual feature extraction.
One of the most popular types of deep neural networks is known as convolutional neural networks (CNN or ConvNet).
A CNN convolves learned features with input data, and uses 2D
convolutional layers, making this architecture well suited to processing
2D data, such as images.
CNNs eliminate the need for manual feature extraction, so you do not need to identify features used to classify images. The CNN works by extracting features directly from images. The relevant features are not pretrained; they are learned while the network trains on a collection of images. This automated feature extraction makes deep learning models highly accurate for computer vision tasks such as object classification.
CNNs eliminate the need for manual feature extraction, so you do not need to identify features used to classify images. The CNN works by extracting features directly from images. The relevant features are not pretrained; they are learned while the network trains on a collection of images. This automated feature extraction makes deep learning models highly accurate for computer vision tasks such as object classification.
CNNs learn to detect different features of an image using tens or
hundreds of hidden layers. Every hidden layer increases the complexity
of the learned image features. For example, the first hidden layer could
learn how to detect edges, and the last learns how to detect more
complex shapes specifically catered to the shape of the object we are
trying to recognize.
Another key difference is deep learning algorithms scale with data, whereas shallow learning converges. Shallow learning refers to machine learning methods that plateau at a certain level of performance when you add more examples and training data to the network.
A key advantage of deep learning networks is that they often continue to improve as the size of your data increases.
What's the Difference Between Machine Learning and Deep Learning?
Deep learning is a specialized form of machine learning. A machine learning workflow starts with relevant features being manually extracted from images. The features are then used to create a model that categorizes the objects in the image. With a deep learning workflow, relevant features are automatically extracted from images. In addition, deep learning performs “end-to-end learning” – where a network is given raw data and a task to perform, such as classification, and it learns how to do this automatically.Another key difference is deep learning algorithms scale with data, whereas shallow learning converges. Shallow learning refers to machine learning methods that plateau at a certain level of performance when you add more examples and training data to the network.
A key advantage of deep learning networks is that they often continue to improve as the size of your data increases.
In machine learning, you manually choose features and a classifier
to sort images. With deep learning, feature extraction and modeling
steps are automatic.
Choosing Between Machine Learning and Deep Learning
Machine learning offers a variety of techniques and models you can choose based on your application, the size of data you're processing, and the type of problem you want to solve. A successful deep learning application requires a very large amount of data (thousands of images) to train the model, as well as GPUs, or graphics processing units, to rapidly process your data.When choosing between machine learning and deep learning, consider whether you have a high-performance GPU and lots of labeled data. If you don’t have either of those things, it may make more sense to use machine learning instead of deep learning. Deep learning is generally more complex, so you’ll need at least a few thousand images to get reliable results. Having a high-performance GPU means the model will take less time to analyze all those images.
How to Create and Train Deep Learning Models
The three most common ways people use deep learning to perform object classification are:Training from Scratch
To train a deep network from scratch, you gather a very large labeled data set and design a network architecture that will learn the features and model. This is good for new applications, or applications that will have a large number of output categories. This is a less common approach because with the large amount of data and rate of learning, these networks typically take days or weeks to train.
Transfer Learning
Most deep learning applications use the transfer learning approach, a process that involves fine-tuning a pretrained model. You start with an existing network, such as AlexNet or GoogLeNet, and feed in new data containing previously unknown classes. After making some tweaks to the network, you can now perform a new task, such as categorizing only dogs or cats instead of 1000 different objects. This also has the advantage of needing much less data (processing thousands of images, rather than millions), so computation time drops to minutes or hours.
Transfer learning requires an interface to the internals of the pre-existing network, so it can be surgically modified and enhanced for the new task. MATLAB® has tools and functions designed to help you do transfer learning.
Most deep learning applications use the transfer learning approach, a process that involves fine-tuning a pretrained model. You start with an existing network, such as AlexNet or GoogLeNet, and feed in new data containing previously unknown classes. After making some tweaks to the network, you can now perform a new task, such as categorizing only dogs or cats instead of 1000 different objects. This also has the advantage of needing much less data (processing thousands of images, rather than millions), so computation time drops to minutes or hours.
Transfer learning requires an interface to the internals of the pre-existing network, so it can be surgically modified and enhanced for the new task. MATLAB® has tools and functions designed to help you do transfer learning.
Feature Extraction
A slightly less common, more specialized approach to deep learning is to use the network as a feature extractor. Since all the layers are tasked with learning certain features from images, we can pull these features out of the network at any time during the training process. These features can then be used as input to a machine learning model such as support vector machines (SVM).
A slightly less common, more specialized approach to deep learning is to use the network as a feature extractor. Since all the layers are tasked with learning certain features from images, we can pull these features out of the network at any time during the training process. These features can then be used as input to a machine learning model such as support vector machines (SVM).
Accelerating Deep Learning Models with GPUs
Training a deep learning model can take a long time, from days to weeks. Using GPU acceleration can speed up the process significantly. Using MATLAB with a GPU reduces the time required to train a network and can cut the training time for an image classification problem from days down to hours. In training deep learning models, MATLAB uses GPUs (when available) without requiring you to understand how to program GPUs explicitly.Deep Learning Applications
Pretrained deep neural network models can be used to quickly apply deep learning to your problems by performing transfer learning or feature extraction. For MATLAB users, some available models include AlexNet, VGG-16, and VGG-19, as well as Caffe models (for example, from Caffe Model Zoo) imported using importCaffeNetwork.Use AlexNet to Recognize Objects with Your Webcam
Use MATLAB, a simple webcam, and a deep neural network to identify objects in your surroundings.Example: Object Detection Using Deep Learning
In addition to object recognition, which identifies a specific object in an image or video, deep learning can also be used for object detection. Object detection means recognizing and locating the object in a scene, and it allows for multiple objects to be located within the image.With just a few lines of code, MATLAB lets you do deep learning without being an expert. Get started quickly, create and visualize models, and deploy models to servers and embedded devices.
Teams are successful using MATLAB for deep learning because it lets you:
- Create and Visualize Models with Just a Few Lines of Code. MATLAB lets you build deep learning models with minimal code. With MATLAB, you can quickly import pretrained models and visualize and debug intermediate results as you adjust training parameters.
- Perform Deep Learning Without Being an Expert. You can use MATLAB to learn and gain expertise in the area of deep learning. Most of us have never taken a course in deep learning. We have to learn on the job. MATLAB makes learning about this field practical and accessible. In addition, MATLAB enables domain experts to do deep learning – instead of handing the task over to data scientists who may not know your industry or application.
- Automate Ground Truth Labeling of Images and Video. MATLAB enables users to interactively label objects within images and can automate ground truth labeling within videos for training and testing deep learning models. This interactive and automated approach can lead to better results in less time.
- Integrate Deep Learning in a Single Workflow. MATLAB can unify multiple domains in a single workflow. With MATLAB, you can do your thinking and programming in one environment. It offers tools and functions for deep learning, and also for a range of domains that feed into deep learning algorithms, such as signal processing, computer vision, and data analytics.
XO____XO ++ DW DW X Function block diagrams
A picture is worth a thousand words is a familiar proverb that asserts that complex stories can be told with a single still image, or that an image may be more influential than a substantial amount of text. It also aptly characterizes the goals of visualization-based software in industrial control. A function block diagram (FBD) can replace thousands of lines from a textual program.
A picture is worth a thousand words is a familiar
proverb that asserts that complex stories can be told with a single
still image, or that an image may be more influential than a substantial
amount of text. It also aptly characterizes the goals of
visualization-based software in industrial control.
A function block diagram (FBD) can replace
thousands of lines from a textual program. Graphical programming is an
intuitive way of specifying system functionality by assembling and
connecting function blocks. The first two parts of this series evaluated
ladder diagrams and textual programming as choices for models of
computation. Here, the strengths and weaknesses FBDs will be discussed
and compared.
Execution control of function blocks in an FBD network is implicit from the function block position in an FBD. |
FBDs
were introduced by IEC 61131-3 to overcome the weaknesses associated
with textual programming and ladder diagrams. An FBD network primarily
comprises interconnected functions and function blocks to express system
behavior. Function blocks were introduced to address the need to reuse
common tasks such as proportional-integral-derivative (PID) control,
counters, and timers at different parts of an application or in
different projects. A function block is a packaged element of software
that describes the behavior of data, a data structure and an external
interface defined as a set of input and output parameters.
In many ways, function blocks can
theoretically be compared with integrated circuits that are used in
electronic equipment. A function block is depicted as a rectangular
block with inputs entering from the left and outputs exiting on the
right. See diagram of typical function block with inputs and outputs.
Key features of function blocks are data
preservation between executions, encapsulation, and information hiding.
Data preservation is enabled by making separate copies of function
blocks in memory every time it is called. Encapsulation handles a
collection of software elements as one entity, and information hiding
restricts external data access and procedures within an encapsulated
element. Because of encapsulation and information hiding, system
designers do not run the risk of accidentally modifying code or
overwriting internal data when copying code from a previous control
solution.
Functions, function block diagrams
A
function is a software element that, when executed with a particular
set of input values, produces one primary result and does not have any
internal storage. Functions are often confused with function blocks,
which have internal storage and may have multiple outputs. Some examples
of functions are trigonometric functions like sin() and cos(),
arithmetic functions like add and multiply, and string handling
functions. Function blocks include PID, counters, and timers.
An FBD is a program constructed by connecting
multiple functions and function blocks resulting in one block that
becomes the input for the next. Unlike textual programming, no variables
are necessary to pass data from one subroutine to another because the
wires connecting different blocks automatically encapsulate and transfer
data.
An FBD can be used to express the behavior of
function blocks, as well as programs. It also can be used to describe
steps, actions, and transitions within sequential function charts
(SFCs).
A function block is not evaluated unless all
inputs that come from other elements are available. When a function
block executes, it evaluates all its variables, including input and
internal variables as well as output variables. During its execution,
the algorithm creates new values for the output and internal variables.
As discussed, functions and function blocks are the building blocks of
FBDs. In FBDs, the signals are considered to flow from the outputs of
functions or function blocks to the inputs of other functions or
function blocks.
Outputs of function blocks are updated as a
result of function block evaluations. Changes of signal states and
values therefore naturally propagate from left to right across the FBD
network. The signal also can be fed back from function block outputs to
inputs of the preceding blocks. A feedback path implies that a value
within the path is retained after the FBD network is evaluated and used
as the starting value on the next network evaluation. See FBD network
diagram.
The execution control of function blocks in an
FBD network is implicit from the position of the function block in an
FBD. For example, in the “FBD network...” diagram, the “Plant Simulator”
function is evaluated after the “Control” function block. Execution
order can be controlled by enabling a function block for execution and
having output terminals that change state once execution is complete.
Execution of an FBD network is considered complete only when all outputs
of all functions and function blocks are updated.
Signals from outputs of function blocks can become inputs to other functions. |
Strengths of FBD
Some FBD strengths follow.
Intuitive and easy to program.
Because FBDs are graphical, it is easy for system designers without
extensive programming training to understand and program control logic.
This benefits domain experts who may not necessarily be experts at
writing specific control algorithms in textual languages but understand
the logic of the control algorithm. They can use existing function
blocks to easily construct programs for data acquisition, and process
and discrete control.
Extensive code reuse . One of
the main benefits of function blocks is code reuse. As discussed, system
designers can use existing function blocks such as PIDs and filters or
encapsulate custom logic and easily reuse this code throughout programs.
Since separate copies are made every time these function blocks are
called, system designers do not risk accidentally overwriting data.
Additionally, function blocks also can be invoked from ladder diagrams
and even textual languages such as structured text, making them highly
portable among different models of computation.
Parallel execution. With the
introduction of multiple-processor-based systems, programmable
automation controllers and PCs now can execute multiple functions at the
same time. Graphical programming languages, such as FBDs, can
efficiently represent parallel logic. While textual programmers use
specific threading and timing libraries to take advantage of
multithreading, graphical, FBD, and dataflow languages (such as National
Instruments LabView) can automatically execute parallel function blocks
in different threads. This helps in applications requiring advanced
control, including multiple PIDs in parallel.
Execution traceability and easy debugging.
Graphical data flow of FBDs makes debugging easy as system designers
can follow the wire connections between functions and function blocks.
Many FBD program editors (such as Siemens Step 7) also provide animation
showing data flow to make debugging easier.
Weaknesses of FBD
Some FBD weaknesses follow.
Algorithm development.
Low-level functions and mathematical algorithms are traditionally
represented in text functions; even algorithms for function blocks
conventionally have been written using textual programming. Furthermore,
function blocks abstract the intricacies of an algorithm, making it
difficult for domain experts trying to learn the details of advanced
control and signal processing techniques.
Limited execution control.
Execution of an FBD network is left to right and is suitable for
continuous behavior. While system designers can control the execution of
a network through “jump” constructs and also by using data dependency
between two function blocks, FBDs are not ideal for solving sequencing
problems. For instance, going from “tank fill” state to “tank stir”
state requires evaluation of all the current states. Depending on the
output, a transition action has to take place before moving to the next
state. While this can be achieved using data dependency of function
blocks, such sequencing might require significant time and effort.
IT integration. With businesses
increasingly seeking ways to connect modern factory floors to the
enterprise, connectivity to the Web and databases has become extremely
important. While textual programs have database-logging capabilities and
source code control features, FBDs generally are unable to integrate
natively with IT systems. Furthermore, IT managers are often trained
only in textual programming.
Need for training . Although
intuitive, data flow is not commonly taught as a model of computation.
In the U.S., engineers are trained to use textual languages, such as
C++, Fortran, and Visual Basic, and technicians are trained in ladder
logic or electrical circuits. FBDs require added training, as they
represent a paradigm shift in writing a control program.
FBDs are a graphical way of representing a
control program and are a dataflow programming model. The intuitiveness,
ease of use, and code reuse of FBDs make them very popular with
engineers. FBDs are ideal for complex applications with parallel
execution and for continuous processing. They also effectively fill gaps
in ladder logic, such as encapsulation and code reuse. To overcome some
of their weaknesses, engineers must employ mixed models of computation.
FBDs are used in conjunction with textual programming for algorithms
and IT integration. Batch and discrete operations are improved by adding
SFCs. The SFC model of computation addresses some of the challenges
faced by FBDs and will be covered in the fourth installment of this
five-part series.
Artificial intelligence
Artificial intelligence (AI), the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings. The term is frequently applied to the project of developing systems endowed with the intellectual processes characteristic of humans, such as the ability to reason, discover meaning, generalize, or learn from past experience. Since the development of the digital computer in the 1940s, it has been demonstrated that computers can be programmed to carry out very complex tasks—as, for example, discovering proofs for mathematical theorems or playing chess—with great proficiency. Still, despite continuing advances in computer processing speed and memory capacity, there are as yet no programs that can match human flexibility over wider domains or in tasks requiring much everyday knowledge. On the other hand, some programs have attained the performance levels of human experts and professionals in performing certain specific tasks, so that artificial intelligence in this limited sense is found in applications as diverse as medical diagnosis, computer search engines, and voice or handwriting recognition.
chess: Chess and artificial intelligence
What is intelligence?
All but the simplest human behaviour is ascribed to intelligence, while even the most complicated insect behaviour is never taken as an indication of intelligence. What is the difference? Consider the behaviour of the digger wasp, Sphex ichneumoneus. When the female wasp returns to her burrow with food, she first deposits it on the threshold, checks for intruders inside her burrow, and only then, if the coast is clear, carries her food inside. The real nature of the wasp’s instinctual behaviour is revealed if the food is moved a few inches away from the entrance to her burrow while she is inside: on emerging, she will repeat the whole procedure as often as the food is displaced. Intelligence—conspicuously absent in the case of Sphex—must include the ability to adapt to new circumstances.Psychologists generally do not characterize human intelligence by just one trait but by the combination of many diverse abilities. Research in AI has focused chiefly on the following components of intelligence: learning, reasoning, problem solving, perception, and using language.
Learning
There are a number of different forms of learning as applied to artificial intelligence. The simplest is learning by trial and error. For example, a simple computer program for solving mate-in-one chess problems might try moves at random until mate is found. The program might then store the solution with the position so that the next time the computer encountered the same position it would recall the solution. This simple memorizing of individual items and procedures—known as rote learning—is relatively easy to implement on a computer. More challenging is the problem of implementing what is called generalization. Generalization involves applying past experience to analogous new situations. For example, a program that learns the past tense of regular English verbs by rote will not be able to produce the past tense of a word such as jump unless it previously had been presented with jumped, whereas a program that is able to generalize can learn the “add ed” rule and so form the past tense of jump based on experience with similar verbs.Reasoning
To reason is to draw inferences appropriate to the situation. Inferences are classified as either deductive or inductive. An example of the former is, “Fred must be in either the museum or the café. He is not in the café; therefore he is in the museum,” and of the latter, “Previous accidents of this sort were caused by instrument failure; therefore this accident was caused by instrument failure.” The most significant difference between these forms of reasoning is that in the deductive case the truth of the premises guarantees the truth of the conclusion, whereas in the inductive case the truth of the premise lends support to the conclusion without giving absolute assurance. Inductive reasoning is common in science, where data are collected and tentative models are developed to describe and predict future behaviour—until the appearance of anomalous data forces the model to be revised. Deductive reasoning is common in mathematics and logic, where elaborate structures of irrefutable theorems are built up from a small set of basic axioms and rules.There has been considerable success in programming computers to draw inferences, especially deductive inferences. However, true reasoning involves more than just drawing inferences; it involves drawing inferences relevant to the solution of the particular task or situation. This is one of the hardest problems confronting AI.
Problem solving
Problem solving, particularly in artificial intelligence, may be characterized as a systematic search through a range of possible actions in order to reach some predefined goal or solution. Problem-solving methods divide into special purpose and general purpose. A special-purpose method is tailor-made for a particular problem and often exploits very specific features of the situation in which the problem is embedded. In contrast, a general-purpose method is applicable to a wide variety of problems. One general-purpose technique used in AI is means-end analysis—a step-by-step, or incremental, reduction of the difference between the current state and the final goal. The program selects actions from a list of means—in the case of a simple robot this might consist of PICKUP, PUTDOWN, MOVEFORWARD, MOVEBACK, MOVELEFT, and MOVERIGHT—until the goal is reached.Many diverse problems have been solved by artificial intelligence programs. Some examples are finding the winning move (or sequence of moves) in a board game, devising mathematical proofs, and manipulating “virtual objects” in a computer-generated world.
Perception
In perception the environment is scanned by means of various sensory organs, real or artificial, and the scene is decomposed into separate objects in various spatial relationships. Analysis is complicated by the fact that an object may appear different depending on the angle from which it is viewed, the direction and intensity of illumination in the scene, and how much the object contrasts with the surrounding field.At present, artificial perception is sufficiently well advanced to enable optical sensors to identify individuals, autonomous vehicles to drive at moderate speeds on the open road, and robots to roam through buildings collecting empty soda cans. One of the earliest systems to integrate perception and action was FREDDY, a stationary robot with a moving television eye and a pincer hand, constructed at the University of Edinburgh, Scotland, during the period 1966–73 under the direction of Donald Michie. FREDDY was able to recognize a variety of objects and could be instructed to assemble simple artifacts, such as a toy car, from a random heap of components.
Language
A language is a system of signs having meaning by convention. In this sense, language need not be confined to the spoken word. Traffic signs, for example, form a minilanguage, it being a matter of convention that {hazard symbol} means “hazard ahead” in some countries. It is distinctive of languages that linguistic units possess meaning by convention, and linguistic meaning is very different from what is called natural meaning, exemplified in statements such as “Those clouds mean rain” and “The fall in pressure means the valve is malfunctioning.”An important characteristic of full-fledged human languages—in contrast to birdcalls and traffic signs—is their productivity. A productive language can formulate an unlimited variety of sentences.
It is relatively easy to write computer programs that seem able, in severely restricted contexts, to respond fluently in a human language to questions and statements. Although none of these programs actually understands language, they may, in principle, reach the point where their command of a language is indistinguishable from that of a normal human. What, then, is involved in genuine understanding, if even a computer that uses language like a native human speaker is not acknowledged to understand? There is no universally agreed upon answer to this difficult question. According to one theory, whether or not one understands depends not only on one’s behaviour but also on one’s history: in order to be said to understand, one must have learned the language and have been trained to take one’s place in the linguistic community by means of interaction with other language users.
Methods and goals in AI
Symbolic vs. connectionist approaches
AI research follows two distinct, and to some extent competing, methods, the symbolic (or “top-down”) approach, and the connectionist (or “bottom-up”) approach. The top-down approach seeks to replicate intelligence by analyzing cognition independent of the biological structure of the brain, in terms of the processing of symbols—whence the symbolic label. The bottom-up approach, on the other hand, involves creating artificial neural networks in imitation of the brain’s structure—whence the connectionist label.To illustrate the difference between these approaches, consider the task of building a system, equipped with an optical scanner, that recognizes the letters of the alphabet. A bottom-up approach typically involves training an artificial neural network by presenting letters to it one by one, gradually improving performance by “tuning” the network. (Tuning adjusts the responsiveness of different neural pathways to different stimuli.) In contrast, a top-down approach typically involves writing a computer program that compares each letter with geometric descriptions. Simply put, neural activities are the basis of the bottom-up approach, while symbolic descriptions are the basis of the top-down approach.
In The Fundamentals of Learning (1932), Edward Thorndike, a psychologist at Columbia University, New York City, first suggested that human learning consists of some unknown property of connections between neurons in the brain. In The Organization of Behavior (1949), Donald Hebb, a psychologist at McGill University, Montreal, Canada, suggested that learning specifically involves strengthening certain patterns of neural activity by increasing the probability (weight) of induced neuron firing between the associated connections. The notion of weighted connections is described in a later section, Connectionism.
In 1957 two vigorous advocates of symbolic AI—Allen Newell, a researcher at the RAND Corporation, Santa Monica, California, and Herbert Simon, a psychologist and computer scientist at Carnegie Mellon University, Pittsburgh, Pennsylvania—summed up the top-down approach in what they called the physical symbol system hypothesis. This hypothesis states that processing structures of symbols is sufficient, in principle, to produce artificial intelligence in a digital computer and that, moreover, human intelligence is the result of the same type of symbolic manipulations.
During the 1950s and ’60s the top-down and bottom-up approaches were pursued simultaneously, and both achieved noteworthy, if limited, results. During the 1970s, however, bottom-up AI was neglected, and it was not until the 1980s that this approach again became prominent. Nowadays both approaches are followed, and both are acknowledged as facing difficulties. Symbolic techniques work in simplified realms but typically break down when confronted with the real world; meanwhile, bottom-up researchers have been unable to replicate the nervous systems of even the simplest living things. Caenorhabditis elegans, a much-studied worm, has approximately 300 neurons whose pattern of interconnections is perfectly known. Yet connectionist models have failed to mimic even this worm. Evidently, the neurons of connectionist theory are gross oversimplifications of the real thing.
Strong AI, applied AI, and cognitive simulation
Employing the methods outlined above, AI research attempts to reach one of three goals: strong AI, applied AI, or cognitive simulation. Strong AI aims to build machines that think. (The term strong AI was introduced for this category of research in 1980 by the philosopher John Searle of the University of California at Berkeley.) The ultimate ambition of strong AI is to produce a machine whose overall intellectual ability is indistinguishable from that of a human being. As is described in the section Early milestones in AI, this goal generated great interest in the 1950s and ’60s, but such optimism has given way to an appreciation of the extreme difficulties involved. To date, progress has been meagre. Some critics doubt whether research will produce even a system with the overall intellectual ability of an ant in the forseeable future. Indeed, some researchers working in AI’s other two branches view strong AI as not worth pursuing.Applied AI, also known as advanced information processing, aims to produce commercially viable “smart” systems—for example, “expert” medical diagnosis systems and stock-trading systems. Applied AI has enjoyed considerable success, as described in the section Expert systems.
In cognitive simulation, computers are used to test theories about how the human mind works—for example, theories about how people recognize faces or recall memories. Cognitive simulation is already a powerful tool in both neuroscience and cognitive psychology.
Alan Turing and the beginning of AI
Theoretical work
The earliest substantial work in the field of artificial intelligence was done in the mid-20th century by the British logician and computer pioneer Alan Mathison Turing. In 1935 Turing described an abstract computing machine consisting of a limitless memory and a scanner that moves back and forth through the memory, symbol by symbol, reading what it finds and writing further symbols. The actions of the scanner are dictated by a program of instructions that also is stored in the memory in the form of symbols. This is Turing’s stored-program concept, and implicit in it is the possibility of the machine operating on, and so modifying or improving, its own program. Turing’s conception is now known simply as the universal Turing machine. All modern computers are in essence universal Turing machines.During World War II, Turing was a leading cryptanalyst at the Government Code and Cypher School in Bletchley Park, Buckinghamshire, England. Turing could not turn to the project of building a stored-program electronic computing machine until the cessation of hostilities in Europe in 1945. Nevertheless, during the war he gave considerable thought to the issue of machine intelligence. One of Turing’s colleagues at Bletchley Park, Donald Michie (who later founded the Department of Machine Intelligence and Perception at the University of Edinburgh), later recalled that Turing often discussed how computers could learn from experience as well as solve new problems through the use of guiding principles—a process now known as heuristic problem solving.
Turing
gave quite possibly the earliest public lecture (London, 1947) to
mention computer intelligence, saying, “What we want is a machine that
can learn from experience,” and that the “possibility of letting the
machine alter its own instructions provides the mechanism
for this.” In 1948 he introduced many of the central concepts of AI in a
report entitled “Intelligent Machinery.” However, Turing did not
publish this paper, and many of his ideas were later reinvented by
others. For instance, one of Turing’s original ideas was to train a
network of artificial neurons to perform specific tasks, an approach described in the section Connectionism.
Chess
At Bletchley Park, Turing illustrated his ideas on machine intelligence by reference to chess—a useful source of challenging and clearly defined problems against which proposed methods for problem solving could be tested. In principle, a chess-playing computer could play by searching exhaustively through all the available moves, but in practice this is impossible because it would involve examining an astronomically large number of moves. Heuristics are necessary to guide a narrower, more discriminative search. Although Turing experimented with designing chess programs, he had to content himself with theory in the absence of a computer to run his chess program. The first true AI programs had to await the arrival of stored-program electronic digital computers.In 1945 Turing predicted that computers would one day play very good chess, and just over 50 years later, in 1997, Deep Blue, a chess computer built by the International Business Machines Corporation (IBM), beat the reigning world champion, Garry Kasparov, in a six-game match. While Turing’s prediction came true, his expectation that chess programming would contribute to the understanding of how human beings think did not. The huge improvement in computer chess since Turing’s day is attributable to advances in computer engineering rather than advances in AI—Deep Blue’s 256 parallel processors enabled it to examine 200 million possible moves per second and to look ahead as many as 14 turns of play. Many agree with Noam Chomsky, a linguist at the Massachusetts Institute of Technology (MIT), who opined that a computer beating a grandmaster at chess is about as interesting as a bulldozer winning an Olympic weightlifting competition.
The Turing test
In 1950 Turing sidestepped the traditional debate concerning the definition of intelligence, introducing a practical test for computer intelligence that is now known simply as the Turing test. The Turing test involves three participants: a computer, a human interrogator, and a human foil. The interrogator attempts to determine, by asking questions of the other two participants, which is the computer. All communication is via keyboard and display screen. The interrogator may ask questions as penetrating and wide-ranging as he or she likes, and the computer is permitted to do everything possible to force a wrong identification. (For instance, the computer might answer, “No,” in response to, “Are you a computer?” and might follow a request to multiply one large number by another with a long pause and an incorrect answer.) The foil must help the interrogator to make a correct identification. A number of different people play the roles of interrogator and foil, and, if a sufficient proportion of the interrogators are unable to distinguish the computer from the human being, then (according to proponents of Turing’s test) the computer is considered an intelligent, thinking entity.Early milestones in AI
The first AI programs
The earliest successful AI program was written in 1951 by Christopher Strachey, later director of the Programming Research Group at the University of Oxford. Strachey’s checkers (draughts) program ran on the Ferranti Mark I computer at the University of Manchester, England. By the summer of 1952 this program could play a complete game of checkers at a reasonable speed.Information about the earliest successful demonstration of machine learning was published in 1952. Shopper, written by Anthony Oettinger at the University of Cambridge, ran on the EDSAC computer. Shopper’s simulated world was a mall of eight shops. When instructed to purchase an item, Shopper would search for it, visiting shops at random until the item was found. While searching, Shopper would memorize a few of the items stocked in each shop visited (just as a human shopper might). The next time Shopper was sent out for the same item, or for some other item that it had already located, it would go to the right shop straight away. This simple form of learning, as is pointed out in the introductory section What is intelligence?, is called rote learning.
The first AI program to run in the United States also was a checkers program, written in 1952 by Arthur Samuel for the prototype of the IBM 701. Samuel took over the essentials of Strachey’s checkers program and over a period of years considerably extended it. In 1955 he added features that enabled the program to learn from experience. Samuel included mechanisms for both rote learning and generalization, enhancements that eventually led to his program’s winning one game against a former Connecticut checkers champion in 1962.
Evolutionary computing
Samuel’s checkers program was also notable for being one of the first efforts at evolutionary computing. (His program “evolved” by pitting a modified copy against the current best version of his program, with the winner becoming the new standard.) Evolutionary computing typically involves the use of some automatic method of generating and evaluating successive “generations” of a program, until a highly proficient solution evolves.A leading proponent of evolutionary computing, John Holland, also wrote test software for the prototype of the IBM 701 computer. In particular, he helped design a neural-network “virtual” rat that could be trained to navigate through a maze. This work convinced Holland of the efficacy of the bottom-up approach. While continuing to consult for IBM, Holland moved to the University of Michigan in 1952 to pursue a doctorate in mathematics. He soon switched, however, to a new interdisciplinary program in computers and information processing (later known as communications science) created by Arthur Burks, one of the builders of ENIAC and its successor EDVAC. In his 1959 dissertation, for most likely the world’s first computer science Ph.D., Holland proposed a new type of computer—a multiprocessor computer—that would assign each artificial neuron in a network to a separate processor. (In 1985 Daniel Hillis solved the engineering difficulties to build the first such computer, the 65,536-processor Thinking Machines Corporation supercomputer.)
Holland joined the faculty at Michigan after graduation and over the next four decades directed much of the research into methods of automating evolutionary computing, a process now known by the term genetic algorithms. Systems implemented in Holland’s laboratory included a chess program, models of single-cell biological organisms, and a classifier system for controlling a simulated gas-pipeline network. Genetic algorithms are no longer restricted to “academic” demonstrations, however; in one important practical application, a genetic algorithm cooperates with a witness to a crime in order to generate a portrait of the criminal.
Logical reasoning and problem solving
The ability to reason logically is an important aspect of intelligence and has always been a major focus of AI research. An important landmark in this area was a theorem-proving program written in 1955–56 by Allen Newell and J. Clifford Shaw of the RAND Corporation and Herbert Simon of the Carnegie Mellon University. The Logic Theorist, as the program became known, was designed to prove theorems from Principia Mathematica (1910–13), a three-volume work by the British philosopher-mathematicians Alfred North Whitehead and Bertrand Russell. In one instance, a proof devised by the program was more elegant than the proof given in the books.Newell, Simon, and Shaw went on to write a more powerful program, the General Problem Solver, or GPS. The first version of GPS ran in 1957, and work continued on the project for about a decade. GPS could solve an impressive variety of puzzles using a trial and error approach. However, one criticism of GPS, and similar programs that lack any learning capability, is that the program’s intelligence is entirely secondhand, coming from whatever information the programmer explicitly includes.
English dialogue
Two of the best-known early AI programs, Eliza and Parry, gave an eerie semblance of intelligent conversation. (Details of both were first published in 1966.) Eliza, written by Joseph Weizenbaum of MIT’s AI Laboratory, simulated a human therapist. Parry, written by Stanford University psychiatrist Kenneth Colby, simulated a human paranoiac. Psychiatrists who were asked to decide whether they were communicating with Parry or a human paranoiac were often unable to tell. Nevertheless, neither Parry nor Eliza could reasonably be described as intelligent. Parry’s contributions to the conversation were canned—constructed in advance by the programmer and stored away in the computer’s memory. Eliza, too, relied on canned sentences and simple programming tricks.AI programming languages
In the course of their work on the Logic Theorist and GPS, Newell, Simon, and Shaw developed their Information Processing Language (IPL), a computer language tailored for AI programming. At the heart of IPL was a highly flexible data structure that they called a list. A list is simply an ordered sequence of items of data. Some or all of the items in a list may themselves be lists. This scheme leads to richly branching structures.In 1960 John McCarthy combined elements of IPL with the lambda calculus (a formal mathematical-logical system) to produce the programming language LISP (List Processor), which remains the principal language for AI work in the United States. (The lambda calculus itself was invented in 1936 by the Princeton logician Alonzo Church while he was investigating the abstract Entscheidungsproblem, or “decision problem,” for predicate logic—the same problem that Turing had been attacking when he invented the universal Turing machine.)
The logic programming language PROLOG (Programmation en Logique) was conceived by Alain Colmerauer at the University of Aix-Marseille, France, where the language was first implemented in 1973. PROLOG was further developed by the logician Robert Kowalski, a member of the AI group at the University of Edinburgh. This language makes use of a powerful theorem-proving technique known as resolution, invented in 1963 at the U.S. Atomic Energy Commission’s Argonne National Laboratory in Illinois by the British logician Alan Robinson. PROLOG can determine whether or not a given statement follows logically from other given statements. For example, given the statements “All logicians are rational” and “Robinson is a logician,” a PROLOG program responds in the affirmative to the query “Robinson is rational?” PROLOG is widely used for AI work, especially in Europe and Japan.
Researchers at the Institute for New Generation Computer Technology in Tokyo have used PROLOG as the basis for sophisticated logic programming languages. Known as fifth-generation languages, these are in use on nonnumerical parallel computers developed at the Institute.
Other recent work includes the development of languages for reasoning about time-dependent data such as “the account was paid yesterday.” These languages are based on tense logic, which permits statements to be located in the flow of time. (Tense logic was invented in 1953 by the philosopher Arthur Prior at the University of Canterbury, Christchurch, New Zealand.)
Microworld programs
To cope with the bewildering complexity of the real world, scientists often ignore less relevant details; for instance, physicists often ignore friction and elasticity in their models. In 1970 Marvin Minsky and Seymour Papert of the MIT AI Laboratory proposed that likewise AI research should focus on developing programs capable of intelligent behaviour in simpler artificial environments known as microworlds. Much research has focused on the so-called blocks world, which consists of coloured blocks of various shapes and sizes arrayed on a flat surface.An early success of the microworld approach was SHRDLU, written by Terry Winograd of MIT. (Details of the program were published in 1972.) SHRDLU controlled a robot arm that operated above a flat surface strewn with play blocks. Both the arm and the blocks were virtual. SHRDLU would respond to commands typed in natural English, such as “Will you please stack up both of the red blocks and either a green cube or a pyramid.” The program could also answer questions about its own actions.Although SHRDLU was initially hailed as a major breakthrough, Winograd soon announced that the program was, in fact, a dead end. The techniques pioneered in the program proved unsuitable for application in wider, more interesting worlds. Moreover, the appearance that SHRDLU gave of understanding the blocks microworld, and English statements concerning it, was in fact an illusion. SHRDLU had no idea what a green block was.
Another product of the microworld approach was Shakey, a mobile robot developed at the Stanford Research Institute by Bertram Raphael, Nils Nilsson, and others during the period 1968–72. The robot occupied a specially built microworld consisting of walls, doorways, and a few simply shaped wooden blocks. Each wall had a carefully painted baseboard to enable the robot to “see” where the wall met the floor (a simplification of reality that is typical of the microworld approach). Shakey had about a dozen basic abilities, such as TURN, PUSH, and CLIMB-RAMP.
Expert systems
Expert systems occupy a type of microworld—for example, a model of a ship’s hold and its cargo—that is self-contained and relatively uncomplicated. For such AI systems every effort is made to incorporate all the information about some narrow field that an expert (or group of experts) would know, so that a good expert system can often outperform any single human expert. There are many commercial expert systems, including programs for medical diagnosis, chemical analysis, credit authorization, financial management, corporate planning, financial document routing, oil and mineral prospecting, genetic engineering, automobile design and manufacture, camera lens design, computer installation design, airline scheduling, cargo placement, and automatic help services for home computer owners.Knowledge and inference
The basic components of an expert system are a knowledge base, or KB, and an inference engine. The information to be stored in the KB is obtained by interviewing people who are expert in the area in question. The interviewer, or knowledge engineer, organizes the information elicited from the experts into a collection of rules, typically of an “if-then” structure. Rules of this type are called production rules. The inference engine enables the expert system to draw deductions from the rules in the KB. For example, if the KB contains the production rules “if x, then y” and “if y, then z,” the inference engine is able to deduce “if x, then z.” The expert system might then query its user, “Is x true in the situation that we are considering?” If the answer is affirmative, the system will proceed to infer z.Some expert systems use fuzzy logic. In standard logic there are only two truth values, true and false. This absolute precision makes vague attributes or situations difficult to characterize. (When, precisely, does a thinning head of hair become a bald head?) Often the rules that human experts use contain vague expressions, and so it is useful for an expert system’s inference engine to employ fuzzy logic.
DENDRAL
In 1965 the AI researcher Edward Feigenbaum and the geneticist Joshua Lederberg, both of Stanford University, began work on Heuristic DENDRAL (later shortened to DENDRAL), a chemical-analysis expert system. The substance to be analyzed might, for example, be a complicated compound of carbon, hydrogen, and nitrogen. Starting from spectrographic data obtained from the substance, DENDRAL would hypothesize the substance’s molecular structure. DENDRAL’s performance rivaled that of chemists expert at this task, and the program was used in industry and in academia.MYCIN
Work on MYCIN, an expert system for treating blood infections, began at Stanford University in 1972. MYCIN would attempt to diagnose patients based on reported symptoms and medical test results. The program could request further information concerning the patient, as well as suggest additional laboratory tests, to arrive at a probable diagnosis, after which it would recommend a course of treatment. If requested, MYCIN would explain the reasoning that led to its diagnosis and recommendation. Using about 500 production rules, MYCIN operated at roughly the same level of competence as human specialists in blood infections and rather better than general practitioners.Nevertheless, expert systems have no common sense or understanding of the limits of their expertise. For instance, if MYCIN were told that a patient who had received a gunshot wound was bleeding to death, the program would attempt to diagnose a bacterial cause for the patient’s symptoms. Expert systems can also act on absurd clerical errors, such as prescribing an obviously incorrect dosage of a drug for a patient whose weight and age data were accidentally transposed.
The CYC project
CYC is a large experiment in symbolic AI. The project began in 1984 under the auspices of the Microelectronics and Computer Technology Corporation, a consortium of computer, semiconductor, and electronics manufacturers. In 1995 Douglas Lenat, the CYC project director, spun off the project as Cycorp, Inc., based in Austin, Texas. The most ambitious goal of Cycorp was to build a KB containing a significant percentage of the commonsense knowledge of a human being. Millions of commonsense assertions, or rules, were coded into CYC. The expectation was that this “critical mass” would allow the system itself to extract further rules directly from ordinary prose and eventually serve as the foundation for future generations of expert systems.With only a fraction of its commonsense KB compiled, CYC could draw inferences that would defeat simpler systems. For example, CYC could infer, “Garcia is wet,” from the statement, “Garcia is finishing a marathon run,” by employing its rules that running a marathon entails high exertion, that people sweat at high levels of exertion, and that when something sweats it is wet. Among the outstanding remaining problems are issues in searching and problem solving—for example, how to search the KB automatically for information that is relevant to a given problem. AI researchers call the problem of updating, searching, and otherwise manipulating a large structure of symbols in realistic amounts of time the frame problem. Some critics of symbolic AI believe that the frame problem is largely unsolvable and so maintain that the symbolic approach will never yield genuinely intelligent systems. It is possible that CYC, for example, will succumb to the frame problem long before the system achieves human levels of knowledge.
Connectionism
Connectionism, or neuronlike computing, developed out of attempts to understand how the human brain works at the neural level and, in particular, how people learn and remember. In 1943 the neurophysiologist Warren McCulloch of the University of Illinois and the mathematician Walter Pitts of the University of Chicago published an influential treatise on neural nets and automatons, according to which each neuron in the brain is a simple digital processor and the brain as a whole is a form of computing machine. As McCulloch put it subsequently, “What we thought we were doing (and I think we succeeded fairly well) was treating the brain as a Turing machine.”Creating an artificial neural network
It was not until 1954, however, that Belmont Farley and Wesley Clark of MIT succeeded in running the first artificial neural network—albeit limited by computer memory to no more than 128 neurons. They were able to train their networks to recognize simple patterns. In addition, they discovered that the random destruction of up to 10 percent of the neurons in a trained network did not affect the network’s performance—a feature that is reminiscent of the brain’s ability to tolerate limited damage inflicted by surgery, accident, or disease.The simple neural network depicted in the figure illustrates the central ideas of connectionism. Four of the network’s five neurons are for input, and the fifth—to which each of the others is connected—is for output. Each of the neurons is either firing (1) or not firing (0). Each connection leading to N, the output neuron, has a “weight.” What is called the total weighted input into N is calculated by adding up the weights of all the connections leading to N from neurons that are firing. For example, suppose that only two of the input neurons, X and Y, are firing. Since the weight of the connection from X to N is 1.5 and the weight of the connection from Y to N is 2, it follows that the total weighted input to N is 3.5. As shown in the figure, N has a firing threshold of 4. That is to say, if N’s total weighted input equals or exceeds 4, then N fires; otherwise, N does not fire. So, for example, N does not fire if the only input neurons to fire are X and Y, but N does fire if X, Y, and Z all fire.
- If the actual output is 0 and the desired output is 1, increase by a small fixed amount the weight of each connection leading to N from neurons that are firing (thus making it more likely that N will fire the next time the network is given the same pattern);
- If the actual output is 1 and the desired output is 0, decrease by that same small amount the weight of each connection leading to the output neuron from neurons that are firing (thus making it less likely that the output neuron will fire the next time the network is given that pattern as input).
Perceptrons
In 1957 Frank Rosenblatt of the Cornell Aeronautical Laboratory at Cornell University in Ithaca, New York, began investigating artificial neural networks that he called perceptrons. He made major contributions to the field of AI, both through experimental investigations of the properties of neural networks (using computer simulations) and through detailed mathematical analysis. Rosenblatt was a charismatic communicator, and there were soon many research groups in the United States studying perceptrons. Rosenblatt and his followers called their approach connectionist to emphasize the importance in learning of the creation and modification of connections between neurons. Modern researchers have adopted this term.
One
of Rosenblatt’s contributions was to generalize the training procedure
that Farley and Clark had applied to only two-layer networks so that the
procedure could be applied to multilayer networks. Rosenblatt used the
phrase “back-propagating error correction” to describe his method. The
method, with substantial improvements and extensions by numerous
scientists, and the term back-propagation are now in everyday use in connectionism.
Conjugating verbs
In one famous connectionist experiment conducted at the University of California at San Diego (published in 1986), David Rumelhart and James McClelland trained a network of 920 artificial neurons, arranged in two layers of 460 neurons, to form the past tenses of English verbs. Root forms of verbs—such as come, look, and sleep—were presented to one layer of neurons, the input layer. A supervisory computer program observed the difference between the actual response at the layer of output neurons and the desired response—came, say—and then mechanically adjusted the connections throughout the network in accordance with the procedure described above to give the network a slight push in the direction of the correct response. About 400 different verbs were presented one by one to the network, and the connections were adjusted after each presentation. This whole procedure was repeated about 200 times using the same verbs, after which the network could correctly form the past tense of many unfamiliar verbs as well as of the original verbs. For example, when presented for the first time with guard, the network responded guarded; with weep, wept; with cling, clung; and with drip, dripped (complete with double p). This is a striking example of learning involving generalization. (Sometimes, though, the peculiarities of English were too much for the network, and it formed squawked from squat, shipped from shape, and membled from mail.)
Another name for connectionism is parallel distributed processing,
which emphasizes two important features. First, a large number of
relatively simple processors—the neurons—operate in parallel. Second,
neural networks store information in a distributed fashion, with each
individual connection participating in the storage of many different
items of information. The know-how that enabled the past-tense network
to form wept from weep, for example, was not stored in
one specific location in the network but was spread throughout the
entire pattern of connection weights that was forged during training.
The human brain also appears to store information in a distributed
fashion, and connectionist research is contributing to attempts to
understand how it does so.
Other neural networks
Other work on neuronlike computing includes the following:- Visual perception. Networks can recognize faces and other objects from visual data. A neural network designed by John Hummel and Irving Biederman at the University of Minnesota can identify about 10 objects from simple line drawings. The network is able to recognize the objects—which include a mug and a frying pan—even when they are drawn from different angles. Networks investigated by Tomaso Poggio of MIT are able to recognize bent-wire shapes drawn from different angles, faces photographed from different angles and showing different expressions, and objects from cartoon drawings with gray-scale shading indicating depth and orientation.
- Language processing. Neural networks are able to convert handwritten and typewritten material to electronic text. The U.S. Internal Revenue Service has commissioned a neuronlike system that will automatically read tax returns and correspondence. Neural networks also convert speech to printed text and printed text to speech.
- Financial analysis. Neural networks are being used increasingly for loan risk assessment, real estate valuation, bankruptcy prediction, share price prediction, and other business applications.
- Medicine. Medical applications include detecting lung nodules and heart arrhythmias and predicting adverse drug reactions.
- Telecommunications. Telecommunications applications of neural networks include control of telephone switching networks and echo cancellation in modems and on satellite links.
Nouvelle AI
New foundations
The approach now known as nouvelle AI was pioneered at the MIT AI Laboratory by the Australian Rodney Brooks during the latter half of the 1980s. Nouvelle AI distances itself from strong AI, with its emphasis on human-level performance, in favour of the relatively modest aim of insect-level performance. At a very fundamental level, nouvelle AI rejects symbolic AI’s reliance upon constructing internal models of reality, such as those described in the section Microworld programs. Practitioners of nouvelle AI assert that true intelligence involves the ability to function in a real-world environment.A central idea of nouvelle AI is that intelligence, as expressed by complex behaviour, “emerges” from the interaction of a few simple behaviours. For example, a robot whose simple behaviours include collision avoidance and motion toward a moving object will appear to stalk the object, pausing whenever it gets too close.
One famous example of nouvelle AI is Brooks’s robot Herbert (named after Herbert Simon), whose environment is the busy offices of the MIT AI Laboratory. Herbert searches desks and tables for empty soda cans, which it picks up and carries away. The robot’s seemingly goal-directed behaviour emerges from the interaction of about 15 simple behaviours. More recently, Brooks has constructed prototypes of mobile robots for exploring the surface of Mars. (See the photographs and an interview with Rodney Brooks.)
The situated approach
Traditional AI has by and large attempted to build disembodied intelligences whose only interaction with the world has been indirect (CYC, for example). Nouvelle AI, on the other hand, attempts to build embodied intelligences situated in the real world—a method that has come to be known as the situated approach. Brooks quoted approvingly from the brief sketches that Turing gave in 1948 and 1950 of the situated approach. By equipping a machine “with the best sense organs that money can buy,” Turing wrote, the machine might be taught “to understand and speak English” by a process that would “follow the normal teaching of a child.” Turing contrasted this with the approach to AI that focuses on abstract activities, such as the playing of chess. He advocated that both approaches be pursued, but until recently little attention has been paid to the situated approach.The situated approach was also anticipated in the writings of the philosopher Bert Dreyfus of the University of California at Berkeley. Beginning in the early 1960s, Dreyfus opposed the physical symbol system hypothesis, arguing that intelligent behaviour cannot be completely captured by symbolic descriptions. As an alternative, Dreyfus advocated a view of intelligence that stressed the need for a body that could move about, interacting directly with tangible physical objects. Once reviled by advocates of AI, Dreyfus is now regarded as a prophet of the situated approach.
Critics of nouvelle AI point out the failure to produce a system exhibiting anything like the complexity of behaviour found in real insects. Suggestions by researchers that their nouvelle systems may soon be conscious and possess language seem entirely premature.
Is strong AI possible?
The ongoing success of applied AI and of cognitive simulation, as described in the preceding sections of this article, seems assured. However, strong AI—that is, artificial intelligence that aims to duplicate human intellectual abilities—remains controversial. Exaggerated claims of success, in professional journals as well as the popular press, have damaged its reputation. At the present time even an embodied system displaying the overall intelligence of a cockroach is proving elusive, let alone a system that can rival a human being. The difficulty of scaling up AI’s modest achievements cannot be overstated. Five decades of research in symbolic AI have failed to produce any firm evidence that a symbol system can manifest human levels of general intelligence; connectionists are unable to model the nervous systems of even the simplest invertebrates; and critics of nouvelle AI regard as simply mystical the view that high-level behaviours involving language understanding, planning, and reasoning will somehow emerge from the interaction of basic behaviours such as obstacle avoidance, gaze control, and object manipulation.However, this lack of substantial progress may simply be testimony to the difficulty of strong AI, not to its impossibility. Let us turn to the very idea of strong artificial intelligence. Can a computer possibly think? Noam Chomsky suggests that debating this question is pointless, for it is an essentially arbitrary decision whether to extend common usage of the word think to include machines. There is, Chomsky claims, no factual question as to whether any such decision is right or wrong—just as there is no question as to whether our decision to say that airplanes fly is right, or our decision not to say that ships swim is wrong. However, this seems to oversimplify matters. The important question is, Could it ever be appropriate to say that computers think, and, if so, what conditions must a computer satisfy in order to be so described?
Some authors offer the Turing test as a definition of intelligence. However, Turing himself pointed out that a computer that ought to be described as intelligent might nevertheless fail his test if it were incapable of successfully imitating a human being. For example, why should an intelligent robot designed to oversee mining on the Moon necessarily be able to pass itself off in conversation as a human being? If an intelligent entity can fail the test, then the test cannot function as a definition of intelligence. It is even questionable whether passing the test would actually show that a computer is intelligent, as the information theorist Claude Shannon and the AI pioneer John McCarthy pointed out in 1956. Shannon and McCarthy argued that it is possible, in principle, to design a machine containing a complete set of canned responses to all the questions that an interrogator could possibly ask during the fixed time span of the test. Like Parry, this machine would produce answers to the interviewer’s questions by looking up appropriate responses in a giant table. This objection seems to show that in principle a system with no intelligence at all could pass the Turing test.
In fact, AI has no real definition of intelligence to offer, not even in the subhuman case. Rats are intelligent, but what exactly must an artificial intelligence achieve before researchers can claim this level of success? In the absence of a reasonably precise criterion for when an artificial system counts as intelligent, there is no objective way of telling whether an AI research program has succeeded or failed. One result of AI’s failure to produce a satisfactory criterion of intelligence is that, whenever researchers achieve one of AI’s goals—for example, a program that can summarize newspaper articles or beat the world chess champion—critics are able to say “That’s not intelligence!” Marvin Minsky’s response to the problem of defining intelligence is to maintain—like Turing before him—that intelligence is simply our name for any problem-solving mental process that we do not yet understand.
AI & Big Data
Facebook
is designing at least one ASIC in a silicon team focused on working
with chip partners, and it rallied support for an AI compiler.
These are three of the biggest problems facing today's AI
theme: humility, or at least, the need for it.
While companies like Google are confidently pronouncing that we live in an "AI-first age," with machine learning breaking new ground in areas like speech and image recognition, those at the front lines of AI research are keen to point out that there’s still a lot of work to be done. Just because we have digital assistants that sound like the talking computers in movies doesn’t mean we’re much closer to creating true artificial intelligence.
Problems include the need for vast amounts of data to power deep learning systems; our inability to create AI that is good at more than one task; and the lack of insight we have into how these systems work in the first place. Machine learning in 2016 is creating brilliant tools, but they can be hard to explain, costly to train, and often mysterious even to their creators. Let’s take a look at these challenges in more detail:
First you get the data, then you get the AI
We all know that artificial intelligence needs data to learn about the world, but we often overlook how much data is involved. These systems don’t just require more information than humans to understand concepts or recognize features, they require hundreds of thousands times more,"And if you look at all the applications domains were deep learning is successful you’ll see they’re domains where we can acquire a lot of data . Here, big tech firms like Google and Facebook have access to mountains of data (for example, your voice searches on Android), making it much easier to create useful tools.Right now, data is like coal was in the early years of the Industrial Revolution. He gives the example of Thomas Newcomen — an Englishman who, in 1712, invented a primitive version of the steam engine that ran on coal, about 60 years before James Watt did. Newcomen’s invention wasn’t very good: compared to Watt’s machine, it was inefficient and costly to run. That meant it was put to work only in coalfields — where the fuel was plentiful enough to overcome the machine’s handicaps.
The world there are hundreds of Newcomens working on their own machine learning models. They might be revolutionary, but without the data to make them work, we’ll never know. Big tech firms like Google, Facebook, and Microsoft are today’s coal mines. They have abundant data and so can afford to run inefficient machine learning systems, and improve them. Smaller startups might have good ideas, but they won’t be able to follow through without data.
"it's considered unethical to force people to become sick to acquire data."
The problem is even bigger when you look at areas where data is difficult to get your hands on. Take health care, for example, where AI is being used for machine vision tasks like recognizing tumors in X-ray scans, but where digitized data can be sparse , the tricky bit here is that it’s "generally considered unethical to force people to become sick to acquire data." (That’s what makes deals like that struck between Google and the National Health Service in the UK so significant.) The problem, is not really about finding ways to distribute data, but about making our deep learning systems more efficient and able to work with less data. And just like Watt’s improvements, that might take another 60 years.
Specialization is for insects — AI needs to be able to multitask
There’s another key problem with deep learning: the fact that all our current systems are, essentially, idiot savants. Once they’ve been trained, they can be incredibly efficient at tasks like recognizing cats or playing Atari games, says Google DeepMind. But "there is no neural network in the world, and no method right now that can be trained to identify objects and images, play Space Invaders, and listen to music." (Neural networks are the building blocks of deep learning systems.)"we can’t even learn multiple games."
The problem is even worse than that, though. When Google’s DeepMind announced in February last year that it’d built a system that could beat 49 Atari games, it was certainly a massive achievement, but each time it beat a game the system needed to be retrained to beat the next one. As Hadsell points out, you can’t try to learn all the different games at once, as the rules end up interfering with one another. You can learn them one at a time — but you end up forgetting whatever you knew about previous games. "To get to artificial general intelligence we need something that can learn multiple tasks," says Hadsell. "But we can’t even learn multiple games."
A solution to this might be something called progressive neural networks — this means connecting separate deep learning systems together so that they can pass on certain bits of information. In a paper published on this topic in June, Hadsell and her team showed how their progressive neural nets were able to adapt to games of Pong that varied in small ways (in one version the colors were inverted; in another the controls were flipped) much faster than a normal neural net, which had to learn each game from scratch.
It’s a promising method, and in more recent experiments it’s even been applied to robotic arms — speeding up their learning process from a matter of weeks to just a single day. However, there are significant limitations, noting that progressive neural networks can’t simply keep on adding new tasks to their memory. If you keep chaining systems together, sooner or later you end up with a model that is "too large to be tractable," she says. And that’s when the different tasks being managed are essentially similar — creating a human-level intelligence that can write a poem, solve differential equations, and design a chair is something else altogether.
It's only real intelligence if you can show your working
Another major challenge is understanding how artificial intelligence reaches its conclusions in the first place. Neural networks are usually inscrutable to observers. Although we know how they’re put together and the information that goes in them, the reasons why they come to certain decisions usually goes unexplained.AI looks for curtains on the floor — not on the windows
A good demonstration of this problem comes from an experiment at Virginia Tech. Researchers created what is essentially an eye-tracking system for a neural network, which records which pixels the computer looks at first. The researchers showed the neural net pictures of a bedroom and asked the AI: "What is covering the windows?" They found that instead of looking at the windows, the AI looked at the floor. Then, if it found a bed, it gave the answer "there are curtains covering the windows." This happened to be right, but only because of the limited data the network had been trained on. Based on the pictures it had been shown, the neural net had deduced that if it was in a bedroom there would be curtains on the windows. So when it saw a bed, it stopped looking — it had, in its eyes, seen curtains. Logical, of course, but also daft. A lot of bedrooms don’t have curtains!
Eye-tracking is one way of shining some light on the inner workings, but another might be to build more coherence into deep learning systems from the get-go. One way of doing this is revisiting an old, unfashionable strand of artificial intelligence known as symbolic AI or Good Old-Fashioned Artificial Intelligence (GOFAI), This is based on the hypothesis that what goes on in the mind can be reduced to basic logic, where the world is defined by a complex dictionary of symbols. By combining these symbols — which represent actions, events, objects, etc. — you can basically synthesize thinking. (If creating an AI this way sounds like a monstrous, unthinkable task, just imagine trying it on computers that still run on magnetic tape.)
These would provide the systems with a starting point for understanding the world, rather than just feeding them data and waiting for them to notice patterns. This, he says, might not only solve the transparency problem of AI, but also the problem of transfer learning outlined, "It’s all very well to say that Breakout is similar to Pong because they’ve both got paddles and balls, but human level cognition makes these types of connections on a much more dramatic scale,. "Like the connection between the structure of the atom and the structure of the solar system."
It’s still in its infancy, and finding out whether it will scale up to larger systems and different types of data will be telling. However, there’s every chance it could develop into something more. After all, deep learning itself was an unloved department of AI until researchers began to plug in the cheap data and abundant processing power made available in recent years. Maybe it’s time for another blast from AI’s past to try its skills in a new environment.
XO___XO ++DW DW X SW Robots and Artificial Intelligence
Artificial intelligence (AI) is arguably the most exciting field in robotics. It's certainly the most controversial: Everybody agrees that a robot can work in an assembly line, but there's no consensus on whether a robot can ever be intelligent.
Like the term "robot" itself, artificial intelligence is hard to define. Ultimate AI would be a recreation of the human thought process -- a man-made machine with our intellectual abilities. This would include the ability to learn just about anything, the ability to reason, the ability to use language and the ability to formulate original ideas. Roboticists are nowhere near achieving this level of artificial intelligence, but they have made a lot of progress with more limited AI. Today's AI machines can replicate some specific elements of intellectual ability.
Computers can already solve problems in limited realms. The basic idea of AI problem-solving is very simple, though its execution is complicated. First, the AI robot or computer gathers facts about a situation through sensors or human input. The computer compares this information to stored data and decides what the information signifies. The computer runs through various possible actions and predicts which action will be most successful based on the collected information. Of course, the computer can only solve problems it's programmed to solve -- it doesn't have any generalized analytical ability. Chess computers are one example of this sort of machine.
Some modern robots also have the ability to learn in a limited capacity. Learning robots recognize if a certain action (moving its legs in a certain way, for instance) achieved a desired result (navigating an obstacle). The robot stores this information and attempts the successful action the next time it encounters the same situation. Again, modern computers can only do this in very limited situations. They can't absorb any sort of information like a human can. Some robots can learn by mimicking human actions. In Japan, roboticists have taught a robot to dance by demonstrating the moves themselves.
Some robots can interact socially. Kismet, a robot at M.I.T's Artificial Intelligence Lab, recognizes human body language and voice inflection and responds appropriately. Kismet's creators are interested in how humans and babies interact, based only on tone of speech and visual cue. This low-level interaction could be the foundation of a human-like learning system.
Kismet and other humanoid robots at the M.I.T. AI Lab operate using an unconventional control structure. Instead of directing every action using a central computer, the robots control lower-level actions with lower-level computers. The program's director, Rodney Brooks, believes this is a more accurate model of human intelligence. We do most things automatically; we don't decide to do them at the highest level of consciousness.
The real challenge of AI is to understand how natural intelligence works. Developing AI isn't like building an artificial heart -- scientists don't have a simple, concrete model to work from. We do know that the brain contains billions and billions of neurons, and that we think and learn by establishing electrical connections between different neurons. But we don't know exactly how all of these connections add up to higher reasoning, or even low-level operations. The complex circuitry seems incomprehensible.
Because of this, AI research is largely theoretical. Scientists hypothesize on how and why we learn and think, and they experiment with their ideas using robots. Brooks and his team focus on humanoid robots because they feel that being able to experience the world like a human is essential to developing human-like intelligence. It also makes it easier for people to interact with the robots, which potentially makes it easier for the robot to learn.
Just as physical robotic design is a handy tool for understanding animal and human anatomy, AI research is useful for understanding how natural intelligence works. For some roboticists, this insight is the ultimate goal of designing robots. Others envision a world where we live side by side with intelligent machines and use a variety of lesser robots for manual labor, health care and communication. A number of robotics experts predict that robotic evolution will ultimately turn us into cyborgs -- humans integrated with machines. Conceivably, people in the future could load their minds into a sturdy robot and live for thousands of years!
In any case, robots will certainly play a larger role in our daily lives in the future. In the coming decades, robots will gradually move out of the industrial and scientific worlds and into daily life, in the same way that computers spread to the home in the 1980s.
The best way to understand robots is to look at specific designs.
Smart Servo: The Difference Between Smart And Regular Servos
Robots keep evolving and so do robot components used the make these new stronger, faster and more precise robots. One of those new components present in more advanced robots, are smart servo motors, also often called robot servo motors. In this article, we will expose the main differences between regular servos and new smart servos.Control Method And Feedback
The main difference between regular and smart servos, is the way they are controlled. With regular servos, communication is unidirectional. The controller unit (Arduino, Raspberry Pi or any other dedicated control unit like the SSC-32U or the BotBoarduino) sends the position command to the servos using a Pulse Width Modulated (PWM) signal. The servo reads this PWM value and moves the servo to the desired position according to the duty cycle of the signal. This is pretty simple, but also pretty limiting. There is no way to get feedback on the actual position. One of the good things with this is that the PWM signal is universal and that all servos, without restriction of brand or model, will respond to the same signal. There are always a few exceptions, of course. For example, continuous or winch servos will react differently, but apart from a few specialty servos, the majority of them are interchangeable.
Instead of using PWM signals to control
the position, smart servos have traded this unidirectional technology
for serial communication which allows for bi-directional communication.
Unfortunately, the smart servo's software protocol and hardware
implementation of the serial communication will vary depending on the
manufacturer, the lineup and grade of servos. The most popular hardware
protocols are: TTL Half-Duplex, TTL Full-Duplex and RS-485. This means
that robot designers need to choose servos of the same brand and lineup
because they are not all compatible with each other.
Fortunately, there are some advantages in using smart servos. One of them is the possibility to get feedback, and most notably position feedback. Although this feature may not be very important in standard RC applications like cars and other vehicles, it is a tremendous advantage in the robotic world. If you want to build a robot arm, knowing the actual arm position in the real world by having access to the position feedback of each joint is a valuable asset. For example, if the controller is sending a +215 degrees position command and then reads the position value of 110 degrees, the controller will be able to know that something is wrong. An object may be blocking the path of your robot arm or it might be trying to lift an object that is too heavy. With that precious information, the controller is able to react and correct the position to prevent burning the servo. Such a thing would not be possible with regular servos without using external sensors, thus adding payload and taking up space on the robot. If you are familiar with standard servos, you may think there is an error above with the +215 degrees position, but there is none. Unlike most regular servos, the majority of smart servos are not limited to +/- 90 degrees (180 degrees total).
Another advantage of serial communication is the way the servos themselves are connected together. Standard servos need to be connected directly to the main controller. This means that different cable lengths are required to do a clean design. Smart servos, on the other hand, can be daisy-chained. What this means is that each servo has two connectors and that instead of connecting all the servos directly to the controller, each servo may be used as a pass-through. For example, in the construction of a robot arm, the first servo would be connected in the controller directly, the second one would be connected to the first one, the third one to the second one and so on. Because all the servos share the same communication line, the servos also need to have a unique address in order to direct the control commands at one particular servo on the line.
Configuration
Standard servos simply don't have the option of being configurable. There is another type of servo, called a digital servo that can be programmed using dedicated programmers to modify parameters such as centre and end point, direction of rotation, maximum speed, dead band and fail-safe to name the most common ones. Smart servos offer a variety of configurations. As with the communication protocol, the configurable parameters will change depending on the brands and lineups, but most of them will have the ability to configure the maximum speed, minimum and maximum position and current safety limit just to name few of the parameters. Some of them can even be configured to act as a continuous rotation servo. The configuration is done via the serial communication so there is no special programmer required for configuration only. The other advantage is that you can modify the configurations in the system itself, no need to unplug the servos and program them one by one.Versatility
Because serial communication is used, different types of input commands can also be sent to the smart servos, compared to only a position command for the regular servos. Most smart servos will not only accept position commands, but also speed and time commands. This means that it is possible to command a servo to run at a target speed of 10 degrees per second for 2.5 seconds. In regular servos, the speed is pretty much dictated by the speed at which the PWM signal is changed. To do the same speed and time control command, all the calculations need to be done inside the controller and the PWM needs to be constantly updated to match the desired speed. Because regular servos lack the feedback, if the PWM signal is changing too fast, the servo will simply lack behind and there will be no way for the controller to know without using additional sensors such as potentiometers or encoders. Speaking of position sensing, it is also possible with smart servos to turn off the motor drive. This makes the servo go limb and rotate freely while keeping the ability to read its live position. This is a particularly useful feature for robots, where you often need to calibrate a specific position. Turn off the motor control, place the robot in your calibration position and then simply record all the current positions.Precision
Because smart servos are the latest technology, most of them benefit from an upgrade in precision. Although, some of the lowest grade smart servos might have equivalent precision to regular servos, most common entry level smart servos get a precision bump over their older brother. The higher the grade of the servo, the higher the precision will get, but so will the price.The Downside Of Smart Servos
Because smart servos are still quite new technology compared to regular servos, there is still no standard. Every company does things their own way which makes it harder for robot builders because once they choose a brand, they are pretty much stuck with it. This is true for the communication protocol, but it is also true for the mechanical aspect of the servos. All companies have their own form factor, brackets and horns. Also, because they all chose to go with patterns that are not compatible with the previous generation of servos, which have a ton of horns, brackets, mounting, pulleys, the number of parts for building robots is pretty limited. The other downside is that they do not cover all the range covered by actual servos. The nano servo and micro servo versions are still nowhere to be found in the smart category. There is also no option for builders who would like to save a few dollars and use cheaper servos for one part of their robot that does not require as much precision or force as the rest of the robot.What Is The Best Servo Type For My Robot ?
Smart servos are the latest, most advanced, most precise type of servos out there, so should I use them for all my robots? Well, not really. Yes, smart servos have plenty of interesting features but they may also be overkill for some projects. Here is a list of top reasons to consider when looking for which type of servo to use in your new project.Top 5 reasons for choosing regular servos instead of smart servos:
- No programming skills required in most cases
- Working with a low budget
- Basic project where precision is not an issue
- No feedback required
- Small robot size (smart servos not available in micro format yet)
Top 5 reasons for choosing smart servos over regular servos
- Movement feedback is required
- More than 180 degrees is required
- More precision required
- Form factor ( Smarts servos's mounting pattern are usually more adapted to robots)
- More versatility required.
IoT devices (internet of things devices)
IoT devices, or any of the many things in the internet of things, are nonstandard computing devices that connect wirelessly to a network and have the ability to transmit data.
IoT involves extending internet connectivity beyond standard devices, such as desktops, laptops, smartphones and tablets, to any range of traditionally dumb or non-internet-enabled physical devices and everyday objects. Embedded with technology, these devices can communicate and interact over the internet, and they can be remotely monitored and controlled.
IoT device examples and applications
Connected devices are part of a scenario in which every device talks to other related devices in an environment to automate home and industry tasks, and to communicate usable sensor data to users, businesses and other interested parties. IoT devices are meant to work in concert for people at home, in industry or in the enterprise. As such, the devices can be categorized into three main groups: consumer, enterprise and industrial.Consumer connected devices include smart TVs, smart speakers, toys, wearables and smart appliances. Smart meters, commercial security systems and smart city technologies -- such as those used to monitor traffic and weather conditions -- are examples of industrial and enterprise IoT devices. Other technologies, including smart air conditioning, smart thermostats, smart lighting and smart security, span home, enterprise and industrial uses.
In a smart home, for example, a user arrives home and his car communicates with the garage to open the door. Once inside, the thermostat is already adjusted to his preferred temperature, and the lighting is set to a lower intensity and his chosen color for relaxation, as his pacemaker data indicates it has been a stressful day.
In the enterprise, smart sensors located in a conference room can help an employee locate and schedule an available room for a meeting, ensuring the proper room type, size and features are available. When meeting attendees enter the room, the temperature will adjust according to the occupancy, and the lights will dim as the appropriate PowerPoint loads on the screen and the speaker begins his presentation.
On a plant floor, an assembly line machine outfitted with sensors will provide sensor data to the plant operator, informing her of anomalies and predicting when parts will need to be replaced. Such information can prevent unexpected downtime, along with lost productivity and profits.
In the field, such notifications can alert users to what is wrong, as well as the parts needed to fix a problem, preventing the need to send a field service worker out to diagnose an issue, only to waste her time driving to a warehouse, finding the correct part and returning to the site.
IoT device management
A number of challenges can hinder the successful deployment of an IoT system and its connected devices, including security, interoperability, power/processing capabilities, scalability and availability. Many of these can be addressed with IoT device management either by adopting standard protocols or using services offered by a vendor.Device management helps companies integrate, organize, monitor and remotely manage internet-enabled devices at scale, offering features critical to maintaining the health, connectivity and security of the IoT devices along their entire lifecycles. Such features include:
- Device registration
- Device authentication/authorization
- Device configuration
- Device provisioning
- Device monitoring and diagnostics
- Device troubleshooting
IoT device management services and software are also available from vendors including Amazon, Bosch Software Innovations GmbH, Microsoft, Software AG and Xively.
IoT device connectivity and networking
The networking, communication and connectivity protocols used with internet-enabled devices largely depend on the specific IoT application deployed. Just as there are many different IoT applications, there are many different connectivity and communications options.Communications protocols include CoAP, DTLS and MQTT, among others. Wireless protocols include IPv6, LPWAN, Zigbee, Bluetooth Low Energy, Z-Wave, RFID and NFC. Cellular, satellite, Wi-Fi and Ethernet can also be used.
Each option has its tradeoffs in terms of power consumption, range and bandwidth, all of which must be considered when choosing connected devices and protocols for a particular IoT application.
To share the sensor data they collect, IoT devices connect to an IoT gateway or another edge device where data can either be analyzed locally or sent to the cloud for analysis.
IoT device security
The interconnection of traditionally dumb devices raises a number of questions in relation to security and privacy. As if often the case, IoT technology has moved more quickly than the mechanisms available to safeguard the devices and their users.Researchers have already demonstrated remote hacks on pacemakers and cars, and, in October 2016, a large distributed denial-of-service attack dubbed Mirai affected DNS servers on the east coast of the United States, disrupting services worldwide -- an issue traced back to hackers infiltrating networks through IoT devices, including wireless routers and connected cameras.
However, safeguarding IoT devices and the networks they connect to can be challenging due to the variety of devices and vendors, as well as the difficulty of adding security to resource-constrained devices. In the case of the Mirai botnet, the problem was traced back to the use of default passwords on the hacked devices. Strong passwords, authentication/authorization and identity management, network segmentation, encryption, and cryptography are all suggested IoT security measures.
Concerned by the dangers posed by the rapidly growing IoT attack surface, the FBI released the public service announcement FBI Alert Number I-091015-PSA in September 2015, which is a document outlining the risks of IoT devices, as well as protections and defense recommendations.
In August 2017, the U.S. Senate introduced the IoT Cybersecurity Improvement Act, a bill addressing security issues associated with IoT devices. While it is a start, the bill only requires internet-enabled devices purchased by the federal government to meet minimum requirements, not the industry as a whole. However, it is being viewed as a starting point which, if adopted across the board, could pave the way to better IoT security industry-wide.
IoT device trends and anticipated growth
Gartner estimated the total number of IoT devices in use to have reached 8.4 billion in 2017, a 31% increase over 2016. And the estimations for future growth of IoT devices have been fast and furious.At the high end of the scale, Intel projected internet-enabled device penetration to grow from 2 billion in 2006 to 200 billion by 2020, which equates to nearly 26 smart devices for each human on Earth. A little more conservative, IHS Markit said the number of connected devices will be 75.4 billion in 2025 and 125 billion by 2030.
Other companies have tempered their numbers, taking smartphones, tablets and computers out of the equation. Gartner estimated 20.8 billion connected things will be in use by 2020, with IDC coming in at 28.1 billion and BI Intelligence at 24 billion.
Gartner estimated the total spend on IoT devices and services at nearly $2 trillion in 2017, with IDC projecting spending to reach $772.5 billion in 2018, 14.6% more than the $674 billion it estimated to be spent in 2017, with it hitting $1 trillion in 2020 and $1.1 trillion in 2021.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tidak ada komentar:
Posting Komentar